The Deepsearch Database

Contents:


Overview of the Deepsearch Database

The Deepsearch database exists to help us keep track of images, supernova candidates, transformations between images, and similar things. It is also used internally by some programs (e.g. the next generation subtraction/scan software, and the Ivan/Rob lightcurve software).

Where is it

Previously, the Deepsearch database was only available from the computers of the supernova group at LBNL. This was largely for historical reasons. Rather than having been designed from the start, the Database grew like a fungus, supporting the needs of that group as they were immediately defined. As the SCP expanded and starting having more of a presence elsewhere, it eventually became clear that this database structure was insufficient.

Now (NOTE: The database is not yet up to date with this docuemtnation, so don't believe any of this) the Deepsearch database has a more distributed structure. It is defined into "master" and "satellite" sites. The master site still is at LBNL in Berkeley. However, satellite sites are able to fully read and write the elements of the database as it exists in Berkeley, using the provided routines and functions.

There is a disadvantage, of course: as more people and sites use the database, it's harder to keep control of what happens with it. Anybody at any satellite site has the ability to seriously screw up the database. As such, everybody should please be very careful to access the database only through the provided interfaces, routines, and programs. Do not develop your own without talking to the people at LBNL who maintain the database, and don't try to do anything clever. This will only cause everybody headaches.

What is it

The deepsearch database has three components. By and large, nobody should need to know this, since you will never e.g. be sending raw SQL commands. In principle, we could completely change the low-lying structure of the database, and you would never need to know it, so long as we kept the high-level interfaces the same.

1. SQL Database

A PostgreSQL server is running on lilys.lbl.gov at LBNL, which maintains tables of information about images in the database, geometric transformations between the images, and other things.

2. NetCDF files

These are effectively flat files, although they're in a format that makes it easy to support a keyword/variable structure, and which is defined to have the same binary representation for all platforms. This is where we store large amounts of information that wouldn't make sense to put in an SQL table. E.g., FITS Image headers are in NetCDF files. This lets you read them without having to read the whole image. However, since you will usually want the whole header of an image at once, it wouldn't make sense to burden down an SQL table with whole FITS headers. The same goes for object catalogs; you usually want the whole catalog for a given image at once, so it makes sense to store that catalog in a flat file rather than in a searchable SQL database.

3. Images

The final component of the database is the actual FITS images themselves. There are more than 100GB of FITS images in the Deepsearch database currently.

How it works

SQL Database

This one is the easiest. The SQL server is running in Berkeley, and listens on the standard PostgreSQL socket. If you are an authorized host, your machine can interact with this SQL database just like any other authorized host. (Note however that you should not abuse this! Use only the given and supported interface routines and programs!)

NetCDF files: reading

This one is trickier. The "master" copy lives at LBNL in Berkeley. For network performance reasons, it doesn't make sense to send a copy of each NetCDF file over the network each time a satellite site wants to read it. Instead, the first time you read a NetCDF file, it will be pulled over the network and stored locally. Thereafter, when you try to read that NetCDF file, the reading routine will check the datestamp on your local copy of the NetCDF file; if it matches the datestamp of the master NetCDF file (which will have been pulled from the SQL), it just opens your local copy. Otherwise, it requests a new copy of the NetCDF file from the master server. (This date verification is one reason you should never try to access any of the Deepsearch database files using anything other than the supported programs and routines.)

NetCDF files: writing

This is even trickier. I haven't fully figured out yet how this will work. However, it will probably be just an extention of the way NetCDF files are read.

Images: reading

This works much like the reading of NetCDF files. The difference is that while the local repository of NetCDF database files will all be in a single location, you can define many disks to be local repositories for Deepsearch images.

Images: writing

This tends not to happen as often as you might think. Images tend to be loaded into the database once, and then read many times. I still haven't figured out the procedure for adding images to the database from remote satellite sites. Stay tuned.

High-Level Deepsearch Database Access Routines

By and large, you should only vaguely be aware that most of these routines even are database access routines.

Web-based Access

Currently, there isn't any. Someday soon there will be.

IDL Routines

tracker
freadimage2
readimsum
freduceimage2
ftransimages2
getcandinfo
ltelescope
subng / scantng
Quimby Zeropoint Routines

Deeplib C++ Routines

TBW

Standalone Programs/Scripts

There aren't any, currently.


Intermediate-Level Deepsearch Database Access Routines

Most people should never need to use any of these routines! Instead, use the high-level routines defined above.

IDL Routines

getimsummary
getsubngsql
getcansummary
getallcandsinset
findsnimotn
addimagetodb
updateimagedb
readtransimages2
writetransimages2
getsubngsql
lockfile
unlockfile

Deeplib C++ Routines

Standalone Programs

TBW.


Low-Level Deepsearch Database Access Routines: SQL

Never Use These! If you are using these, then you are doing something wrong. The only people who should use these are the people who really know what's going on on the lowest level with the Deepsearch database. Right now, that includes maybe one or two people. Those two people know who they are, and you aren't one of them. Those two people should only use these routines when they are making new intermediate level routines. Everybody else should use the high-level routines, or, if they really know they need to, the intermediate-level routines.

IDL Routines

querysql
sendsqlcommand
getsqlstruct

Deeplib C++ Routines

Standalone Programs

psql
gtksql
libpq-fe
Pg