The Deepsearch Database
Contents:
Overview of the Deepsearch Database
The Deepsearch database exists to help us keep track of images,
supernova candidates, transformations between images, and similar
things. It is also used internally by some programs (e.g. the next
generation subtraction/scan software, and the Ivan/Rob lightcurve
software).
Where is it
Previously, the Deepsearch database was only available from the
computers of the supernova group at LBNL. This was largely for
historical reasons. Rather than having been designed from the start,
the Database grew like a fungus, supporting the needs of that group as
they were immediately defined. As the SCP expanded and starting having
more of a presence elsewhere, it eventually became clear that this
database structure was insufficient.
Now (NOTE: The database is not yet up to date with this
docuemtnation, so don't believe any of this) the Deepsearch database
has a more distributed structure. It is defined into "master" and
"satellite" sites. The master site still is at LBNL in Berkeley.
However, satellite sites are able to fully read and write the elements
of the database as it exists in Berkeley, using the provided routines
and functions.
There is a disadvantage, of course: as more people and sites use the
database, it's harder to keep control of what happens with it. Anybody
at any satellite site has the ability to seriously screw up the
database. As such, everybody should please be very careful to access
the database only through the provided interfaces,
routines, and programs. Do not develop your own without talking to the
people at LBNL who maintain the database, and don't try to do anything
clever. This will only cause everybody headaches.
What is it
The deepsearch database has three components. By and large, nobody
should need to know this, since you will never e.g. be sending raw SQL
commands. In principle, we could completely change the low-lying
structure of the database, and you would never need to know it, so long
as we kept the high-level interfaces the same.
1. SQL Database
- A PostgreSQL server is running on lilys.lbl.gov at LBNL, which
maintains tables of information about images in the database,
geometric transformations between the images, and other things.
2. NetCDF files
- These are effectively flat files, although they're in a format that
makes it easy to support a keyword/variable structure, and which is
defined to have the same binary representation for all platforms.
This is where we store large amounts of information that wouldn't
make sense to put in an SQL table. E.g., FITS Image headers are in
NetCDF files. This lets you read them without having to read the
whole image. However, since you will usually want the whole header
of an image at once, it wouldn't make sense to burden down an SQL
table with whole FITS headers. The same goes for object catalogs;
you usually want the whole catalog for a given image at once, so it
makes sense to store that catalog in a flat file rather than in a
searchable SQL database.
3. Images
- The final component of the database is the actual FITS images
themselves. There are more than 100GB of FITS images in the Deepsearch
database currently.
How it works
SQL Database
- This one is the easiest. The SQL server is running in Berkeley, and
listens on the standard PostgreSQL socket. If you are an authorized
host, your machine can interact with this SQL database just like any
other authorized host. (Note however that you should
not abuse this! Use only the given and supported interface routines
and programs!)
NetCDF files: reading
- This one is trickier. The "master" copy lives at LBNL in Berkeley.
For network performance reasons, it doesn't make sense to send a
copy of each NetCDF file over the network each time a satellite site
wants to read it. Instead, the first time you read a NetCDF file,
it will be pulled over the network and stored locally. Thereafter,
when you try to read that NetCDF file, the reading routine will
check the datestamp on your local copy of the NetCDF file; if it
matches the datestamp of the master NetCDF file (which will have
been pulled from the SQL), it just opens your local copy.
Otherwise, it requests a new copy of the NetCDF file from the master
server. (This date verification is one reason you should never try
to access any of the Deepsearch database files using anything other
than the supported programs and routines.)
NetCDF files: writing
- This is even trickier. I haven't fully figured out yet how this
will work. However, it will probably be just an extention of the
way NetCDF files are read.
Images: reading
- This works much like the reading of NetCDF files. The difference is
that while the local repository of NetCDF database files will all be
in a single location, you can define many disks to be local
repositories for Deepsearch images.
Images: writing
- This tends not to happen as often as you might think. Images tend
to be loaded into the database once, and then read many times. I
still haven't figured out the procedure for adding images to the
database from remote satellite sites. Stay tuned.
High-Level Deepsearch Database Access Routines
By and large, you should only vaguely be aware that most of these
routines even are database access routines.
Web-based Access
Currently, there isn't any. Someday soon there will be.
IDL Routines
- tracker
-
- freadimage2
-
- readimsum
-
- freduceimage2
-
- ftransimages2
-
- getcandinfo
-
- ltelescope
-
- subng / scantng
-
- Quimby Zeropoint Routines
-
Deeplib C++ Routines
TBW
Standalone Programs/Scripts
There aren't any, currently.
Intermediate-Level Deepsearch Database Access Routines
Most people should never need to use any of these routines! Instead,
use the high-level routines defined above.
IDL Routines
getimsummary
getsubngsql
getcansummary
getallcandsinset
findsnimotn
addimagetodb
updateimagedb
readtransimages2
writetransimages2
getsubngsql
lockfile
unlockfile
Deeplib C++ Routines
Standalone Programs
TBW.
Low-Level Deepsearch Database Access Routines: SQL
Never Use These! If you are using these, then you are doing
something wrong. The only people who should use these are the
people who really know what's going on on the lowest level with the
Deepsearch database. Right now, that includes maybe one or two people.
Those two people know who they are, and you aren't one of them. Those
two people should only use these routines when they are making new
intermediate level routines. Everybody else should use the high-level routines, or, if they really know they
need to, the intermediate-level routines.
IDL Routines
- querysql
-
- sendsqlcommand
-
- getsqlstruct
-
Deeplib C++ Routines
Standalone Programs
- psql
-
- gtksql
-
- libpq-fe
-
- Pg
-