The McDonalds Cluster

The McDonalds Cluster is a cluster of 5 400MHz Pentium-II machines, each with 256MB of RAM. They are on a private network, hidden behind lilys, and cannot be used as a standard interactive workstation as can ajanta, jimbean, and the rest. All processes on the McDonalds cluster are managed by a queuing system called PBS.

Writing jobs to run on the McDonalds Cluster

In order to succesfully run a task or a program on the McDonalds cluster, you must be able to run it non-interactively in the background, not requiring any Xwindows display or input from the user. Once your job is in this state, it is usually pretty easy to get it to run under PBS.

In order to run a job under PBS, create a shell script that has the commands you want to run. This shell script should look something like:

#PBS -k oe -j oe
#PBS -l nodes=1

hostname
echo "Hello, world!"
Substitute the actual work you want to do for the echo command. Normally, when I do this, I just have a single line that runs a Perl script that does the actual work....

Once you've written your PBS shell script (which I shall assume you called "myjob.pbs"), you can submit it to the queue with the command:

% qsub myjob.pbs
This command must be run on lilys! It won't work from any other computer. This is one case where you actually should log into lilys, rather than another of the PCs. There will be a brief pause, and then the computer will output the job number in a format something like:
477.lilys
477 is the "job number" in this case.

Once the job starts, the output (standard out and standard error) will be written to a file in your home directory named "myjob.pbs.o477" (where the actual number of the job will be in place of 477, and the actual name of your PBS script will be in place of "myjob.pbs"). You can "tail -f" this file to monitor it's progress while it's running, and you can "more" or "less" this file after the job is done to see what your program printed while it was running.

Note that your job will be unceremoniously killed if it runs "too long." How long "too long" is depends on the queue; see which queue to use below.

Checking the queue status

Note that because PBS is a queuing system, your job may not run immediately. Somebody else may have already filled up the queue, keeping all of the computers busy, so your job will have to wait to run. You can see what is in the queue, and which jobs are running, with the command:

% qstat -an
Note that this qstat command will work from all of the PCs, not just from lilys. You should be able to find your job somewhere in the queue. If you can't, that means that the computer has run it already, and it's done! Look at the aforementioned output file to find out what happened.

Which queue to use

There are currently three queues defined: scp, short, and long. The default queue (to which your job will be submitted if you don't specify another) is scp. short is for shorter jobs that don't need to run as long; it has a higher priority. long is for long jobs that may need to run for many hours; it has a lower priority. The following table summarizes the queues:

QueueDefault WalltimeMaximum Walltime Priority
scp 10 minutes 4 hours 0
short 5 minutes 20 minutes 1
long 4 hours 24 hours -1
brobdingnagian 24 hours 80 hours -2

The "walltime" of your job is the amount of time it will be allowed to run before it is unceremoniously killed. This is time as measured by a "wall clock" (hence "walltime"), not the amount of CPU time. So, by default, when you submit a job, if it doesn't finish within 10 minutes it will be terminated. You can ask to increase the walltime limit up to the maximum for the queue by specifing the "walltime" option (see Additional PBS Options below).

Passing arguments to PBS scripts

Sometimes, you write one PBS script which you want to submit many, many times, only operating (for example) on a different image file each time. It is possible to just write a whole truckload of scripts named "doimg001.pbs", "doimg002.pbs", "doimg003.pbs", etc. However, you can also pass arguments to the PBS script in environment variables. In order to do this, you have to tell PBS that you want to preserve the environment variable with the "-v" option. Here is an example:

#PBS -k oe -j oe
#PBS -l nodes=1
#PBS -v IMAGE_TO_DO

hostname
perl ~rknop/bin/reduce_data.perl $IMAGE_TO_DO
In this example, the value of the environment variable IMAGE_TO_DO is passed as an argument to the perl script. You would then submit this job (assuming the example script is named "doimg.pbs" with the command:
% setenv IMAGE_TO_DO img001.fits ; qsub doimg.pbs

Additional PBS Options

There are lots of other options you can give to PBS with "PBS" directives at the top of your PBS scripts. For example, to specify a different queue for the example script of the previous section, you would use the -q option as follows:

#PBS -k oe -j oe #PBS -l nodes=1 #PBS -v IMAGE_TO_DO #PBS -q long hostname perl ~rknop/bin/reduce_data.perl $IMAGE_TO_DO

These are some of the key options you might want to specify to PBS:

-q queue
Name of the queue to submit to. See the section on which queue to use.

-l walltime=walltime
Ask to increase the walltime your job has before it is terminated. Keep this below the limit for the queue you're using, or your job will never be started. Specify time as hours, minutes, and seconds in the format hh:mm:ss. For example, to ask for your job to last 1 hour and 30 minutes, you'd specify -l walltime=01:30:00.

-v variable
Pass the environment variable to the job run in by the queue. By default, only a handful of core environment variables are passed to the jobs on the cluster, for security reasons. If you want to pass arguments in environment variables, you have to explicitly tell PBS to keep that environment variable with a -v flag.

-p priority
You can manually lower the priority of your jobs. If you're doing a lot of things at once, and want some to be done by the queue before others, give the others a lower priority with a flag like "-p -1". You can try giving yourself a higher priority, but that shouldn't work. (If it does, I will try to fix it so it won't. :) )

Disk Usage on the McDonalds Cluster

Read this section carefully. Careless use of network mounted disks from the McDonalds cluster will both excessively load the LBL network, and will gratuitously slow down your jobs as it accesses files over inefficient networks.

The McDonalds cluster is on a private network behind lilys. The file server rustica is also on this private network (as well as on the general LBLnet; it has two ethernet cards). This means that it can access any of rustica's disks through its direct network connection to rustica. However, accessing any disk outside of rustica (including disks on the Suns like /home/astro17 and disks on other PCs like /home/astro10 or /home/astro23) must bounce through lilys. This will load not just the network connection from the cluster node to lilys, but also lilys' CPU, and the network connection from lilys to the rest of the world.

For reading short things like IDL programs, compiled C programs, parameter files, and the like, this is not a big deal; read those files from wherever in the world they are. However, for big data files like FITS images, you will be more efficient and a better LBL net citizen if you put those files on a disk which is on lilys before accessing them from a McDonalds cluster job. See Rob's page on SCP disk usage to figure out which scratch and user disks are mounted on lilys.

For storage of temporary files, especially big temporary images, you should use a disk which is not on the network at all, but is local on the cluster computer where you are working. When your job starts, a directory /tmp/yourname is created. There should more more than 1.5 gigabytes of space available in this directory. Write any temporary files here. This directory will be automatically deleted when your PBS job finishes.

Note that you may even want to write big output files (e.g. output images in a flatfield program) to your /tmp/yourname directory, and then copy them out to a disk on lilys in one go at the very end after all of the processing is done. One big copy over the network at the end may be more efficient than lots and lots of little writes over the network the whole way through. What's more, if you are going to be sumitting lots of jobs that will all need to be doing copying of files over the network, you might want to implement some sort of hack "network locking" to avoid bogging down lilys and slowing down all of your jobs. If you are interested in this sort of thing, talk to Rob, who did this on the 32-node FTG cluster during the nearby search.


rknop@lbl.gov
Last Modified: 2000-October-04