Deepsearch Software Docs: Images and the Database

Many FITS Images

Because this is an astronomy group, the deepsearch software revolves around keeping track of lots and lots of astronomical images. (Indeed, we have yet to make a good way of integrating the spectra we have into the database, and right now those are each individually kept track of mostly by Isobel Hook.)

The images themselves are stored as FITS files. There are far more of these images than would fit on a single disk, so they are scattered over tens of disks mounted on the various Suns in our group. If you have set your account up to run IDL (see Setting Up), then the environment variable DEEPIMAGEPATH contains the search path the deepsearch software uses when looking for images. From the Unix command line, issue the command

echo $DEEPIMAGEPATH

to see the directories where the software looks for images.

Overview of the Image Database

The image database is simply a tool that keeps track of a lot of header information about all of our images. It is there to simplify finding the images we want, even if we don't know their location, their filenames, or even that they exist. You can, for instance, search the database for all images from a given telescope on a given night, or all images that include a given point on the sky.

Of course, it's easy to abuse this power, and do things that aren't wise. Some of the most important caveats you should keep in mind before using the image database are:

Images aren't necessarily unique!: One tempting use of the database is to do statistical or variability studies by getting (say) all images of the same star and seeing how the star's brightness changes. And, indeed, this is a valid use. However, do not assume that each image listed in the database is a different image. The same image could easily be listed twice! Why on earth would this happen? The most common reason is because the image went through two different reduction procedures. During earlier searches, we would send data back from Chile to search it in real time. Internet bandwidth considerations required us to use lossy compression. These degraded images got loaded into the database, since our search software needed it. Later, when we got the images back on tape, we'd reduce them again, more carefully, and load the better reduced image into the database. But the two images are not statistically different, and are certainly not independent measurements! There are other cases where similar things have happened, because coadded images were loaded into the database in addition to the individual images in the sum (which can be very hard to automatically handle, since it won't be done the same way every time), or because the first time around images were reduced poorly, but couldn't be deleted because other things depended on them. So be careful. One good thing o do is to look at image coordinate, telescope, and date/time. If the three are the same for two images (or close enough in the case of coordinate), then the two images are probably the same data (though perhaps reduced in different ways).
Images aren't necessarily background subtracted!: Once upon a time, all images in our database were "surfaced" or sky subtracted. The primary reason, so far as I can tell, is that our subtraction and search software requires background subtracted images. Unfortunately, some information is lost in background subtraction (also called surfacing), and while the tradeoffs made in surfacing may be best for searching for supernovae, they may not be best for other purposes. The process of relaxing the surfacing requirement has begun. There is a database field, "surfaced", available from the ims[] array. If that field is 1, then the image is surfaced; if it is 0, then it is not. For many purposes, the IDL routine autosurface will perform the background subtraction you want, but you now have the freedom to do a background subtraction you prefer for your purpose. (For those images reduced since this paradigm shift, anyway.)
Images aren't oriented the way you think they are!: Most, but not all, of the images in the database are oriented North Down, East Left. Yes, this is perverse. The reasons are historical, and I don't even fully know them. Most of our software implicitly assumes this orientation. It's safest for you to make no assumptions, but to figure out the orientation yourself (e.g. using APM/APS/USNO star matching).
Other: Yeah, I probably forgot a lot of things that should have been listed here.

Images: Orientation and Nature

All of the FITS images in the database include a header (which is mostly ignored, since the parts the software cares about is duplicated in the database), and the two-dimensional image data. These FITS images may be displayed and analyzed directly with a package such as SAOimage or a recent version of IRAF (with the proper massaging; cf: Rob Knop or Greg Aldering for information).

For simplicity of software development, most of the images in our database are oriented the same way. For historical reasons, that orientation is, unfortunately, North Down, East to the Left. This means that you have to flip the image about a horizontal axis to get a more traditional orientation. (The IDL command rotate with a parameter of 7 performs rotation.) However, normally, you will just keep the images in this "standard" rotation, since that is what the deepsearch software expects.

So that it's all (hopefully) in one place, the following are the implicit assumptions which are made about the images. An image to be properly loaded in our database should satisfy these assumptions:

Orientation: North down, East to the left.
Gain: the image must be divided by the gain, so that one count in the image in the database represents one electron. (Yes, this is annoying. In particulary, it makes "legal" weighted sums impossible. Rknop is considering trying to relax this requirement, but will probably never do it.)
Location: Aside from having records in the database, the image itself should be somewhere on $DEEPIMAGEPATH.

The Database

Although the images represent the biggest hit on disk space, it's only half of the story. The deepsearch software keeps track of a vast quantity of things in its database.

The database is mostly documented elsewhere. For example:

Finding Images: How to figure out what images we have in our database, and how to find images of certain specifications.
Reading and displaying images: The most basic of operations.
The Deepsearch Database: More lower level information about the Deepsearch database. This is more information that most people need to know. You only need to know it if you are doing something like defining new tables to go into the deepsearch database, or if you are maintaining the deepsearch database.

Last updated: 2001-January-26