The Deepsearch PC System Administration Guide

All of the deepsearch PCs are administered by me, Rob Knop. I've set them up so that they all have a uniform system administration environment. The idea is to make it easier on myself, so that I am administering a set of identically configured PCs, rather than having to adminster nine or ten individual workstations. This is the reason I don't give everybody the root password, and allow them to configure their own systems to their heart's content. If somebody really must do that, then they must also detach it from the rest of the Deepsearch and make it a standalone machine for which I am not responsible.

Contents:


Conceptual Overview

The PCs are set up so that it is simple to administer a large number of them. As you read this manual, you are likely to think that I've made things unbearably complicated. "Why not just let me go in and edit the system file I need to edit?" you will cry. Bear with me. If I let you do that, then I'd be trying to keep track of n individual chaotically configured PCs. The overhead in the system I use makes it harder than managing one PC directly, but makes it much easier than managing nine or even six individual PCs.

Each PC workstation has the most essential system files stored locally. (This doesn't mean that any PC will fully function in the absense of the others, because things like home directories are shared.) It is a stock Red Hat system, modified by those changes to system files needed to adapt it for the environment we desire for SCP usage. We use the "fcm" system to handle installing customizations to a base Red Hat install. This serves two purposes. One, it makes it possible to see exactly which files have been modified from the standard Red Hat install, and hence quick to reapply them in the event that Linux needs to be reinstalled from scratch on a workstation. Two, because many of the modifications will be identical across all of the workstations, it saves a lot of hassle and work to store the modifications in a central location.

The SCP computers were formerly managed through a system called nerscify, which is very similar to fcm from a practical standpoint. As things currently stand, the nerscify system manages only the few remaining RedHat 7.1 systems, while fcm manages the RedHat 7.3 and RedHat Enterprise WS 3 systems.  Hopefully we will be able to eliminate the 7.1 systems soon and get everything onto fcm.

So, in short: every system file which is added to or different from those files installed by a stock Red Hat installation is stored in the fcm or nerscify database.


Types of PCs & Operational Requirements

All of the deepsearch PC's are designated as either "Workstations" or "Servers." Servers are not available for user login, workstations are.   The servers are configured such that (for the most part) they can boot independently of any of the other computers.  This is not true of the workstations.

Workstations

Workstations are computers available for direct use. Either you can sit down at their console and use them, or you can connect to them using ssh and run programs directly on them. They are all administered under the aegis of fcm or nerscify. For a computer to be a Linux workstation administered by us, it must satisfy a few requirements:


Future Plans

Here are some things I would like to do with the machines:


fcm Overview

fcm is a tool which keeps a database of new and modified system files, and which copies them into the running system directories when so instructed. The database is organized hierarchically. You define "classes" of configurations; each class includes a set of files to install into the system directories. The class structure is hierarchical. The class structure for the Deepsearch PCs as of 2004/04/20 looks like:


#                                     |--pivht--|--plearn
#                                     |         
#                                     |
#                                     |--single--|--cactus
#                            |--work--|          |--dara
#                   |--rh73--|        |          |--vics
#            |--pc--|        |        |          
#            |      |        |        |
# default -- |      |        |        |--smp--|
#            |      |        |                |--barneys
#            |      |        |                |--cha-am
#            |      |        |                |--filippos
#            |      |        |                |--flints
#            |      |        |                |--jimbean
#            |      |        |                |--lavals
#            |      |        |                |--milano
#            |      |        |                |--paloma
#            |      |        |                |--sauls
#            |      |        |                |--spats
#            |      |        |                |--sierra
#            |      |        |                |--venezia
#            |      |        |
#            |      |        |
#            |      |        |
#            |      |        |                   |--kirin
#            |      |        |                   |--lilys
#            |      |        |          |--norz--|--lococos
#            |      |        |          |        |--
#            |      |        |          |        |--rustica
#            |      |        |          |        |--spengers
#            |      |        |          |
#            |      |        |          |
#            |      |        |--server--|
#                   |        |          |
#                   |        |          |
#                   |        |          |--rz--|
#                   |        |
#                   |        |--snfactory--|--jsbach          
#                   |                      |--wfbach
#                   |
#                   |--rhes3--|--ajanta
#                             |--oscars
#                             |--skates
#                             |--topdog

Consider for example the machine barneys. When you run fcm on barneys, it will first install the configuration files for the "default" class, then the "pc" class, then the "rh73" class, then the "work" class, the "smp" class, and finally for the "barneys" class. Although this seems horribly complicated, it is actually fairly convenient. Those files (e.g. the password file and the automounter configuration file) which are shared between every SCP PC workstation may be put in the default class. Those files shared by every RedHat 7.3 workstation may be put in the rh73 class. Those files shared by only the smp machines (e.g. the smp aware kernel) may be put in the smp class. Finally, system files specific only to barneys (e.g. disks mounted and exported) are put within the barneys class.

If the same file appears in both a higher and a lower class, unless the first line of the file is "@APPEND@", the file in the lower class will overwrite the file in the higher class. So, for example, if there was an entry in the fcm database for the file /etc/aluminum_pipe for both the default and barneys classes, the version from the barneys class is what would end up installed on barneys. (If the first line of an entry in the fcm database is "@APPEND@", then the contents of that file are appended to whatever is already present on the system.)

Directories Managed by fcm

The system directories are the ones managed by fcm. This includes the / and /usr filesystems on all of the PCs.

Directories NOT Managed by fcm

The following directories are examples of those not managed by fcm:

Directories not managed by fcm are free for full chaotic manipulation. The most relevant of these are /home/steege (the system disk where the fcm system and other things are stored) and /usr/local (software not installed using Red Hat's RPM which is runnable on all of the PCs). Those two filesystems are backed up infrequently to tape. Some of the other filesystems are backed up, some of them (such as user home directories) are the users' responsibility to back up if they care.

Modifying a System File Under the Aegis of fcm

If you need to modify a system file, for instance anything in /etc, you must do it using fcm. (Actually, most users shouldn't be doing this kind of thing anyway, but should be letting the sysadmin who really understands fcm do it.) If you edit it directly, your changes may be overwritten and lost the next time somebody runs fcm. If you are editing a stock Red Hat file (which thus isn't entered into the fcm database), then nobody will ever be able to tell that this is a modified file. The systems will start to diverge; as new workstations are installed, or if your system must be reinstalled, the changes may get lost. Chaos will ensue.

When you are going to modify a file under the aegis of fcm, you must first figure out which class the file most logically belongs to. Is it something that ought to be the same for every workstation (e.g. adding a user to the password file)? If so, then you should edit the "default" class in fcm. Is it something that applies to all the single processor machines (e.g. a new kernel)? Then edit the "single" class. Is it something specific only to one machine (e.g. mounting a new disk)? Then edit the class for that machine.

For each class, there is a directory:

/home/steege/fcm/custom/class/install/
that has all the customized files for that class. So, if you wanted to change /etc/aluminum_pipe for all of the Deepsearch PCs, you would edit the file:
/home/steege/fcm/custom/default/install/etc/aluminum_pipe

In many cases, the file will already exist in the fcm area. Suppose, however, that /etc/aluminum_pipe is installed with Red Hat, and that it hasn't been modified before. If you now want to modify it, you will have to suck it into the fcm area. This is quite simple. First do:

% cp -p /etc/aluminum_pipe /home/steege/fcm/custom/default/install/etc/
and then edit the file.

Note that things can be subtle. It may be that there is an entry for /etc/aluminum_pipe in both the "default" class and the "rh73" class. The "default" class version would apply to every machine except for the RedHat 7.3 machines. You may then go in and edit the file for the "default" class, but after running fcm see that the file /etc/aluminum_pipe on your system hasn't changed. All of this makes sense if you really understand fcm. The rules are: know what you're doing, and tread with caution. If you don't know what you're doing, just let the sysadmin do it. If you are the sysadmin and you don't know what you're doing, then learn what you're doing, and quick.

The fcm.conf file

To be written.

Variable Substitutions

To be written.

"By-Hand" Scripts

fcm mostly works by installing customized files. Sometimes this doesn't entirely work, and there may be a few additional steps you need to take to turn a stock Red Hat system into a happy SCP system. These steps are stored in shell scripts in /home/steege/fcm/byhand. Take a look at what's there, and hopefully you will understand how they work. Normally the byhand scripts aren't something you need to run over and over again.

Running fcm

fcm must be run as root.

First make sure that any emacs backup files have been cleaned out of the fcm database. We don't want to go spamming the workstations' system directories with those. You may do this with the following three commands:

% cd /home/steege/fcm/custom
% find . -name "*~" -not -path "*dist*" print
% find . -name "*~" -not -path "*dist*" -ok rm \{\} \;
If you see something other than what you expect after the second command, stop and use your brain.

Now you may run fcm for the machine you are on. Do this as follows:

% cd /home/steege/fcm
% ./fcm -h
machine_name -q
The "-l" switch tells nerscify to run LILO after it's done with all its customizations. This is usually a good idea, because one of the things that usually gets installed is a kernel. (Even if its the same kernel as was already there (e.g. from a previous iteration of nerscify), it's possible that nerscify will copy the kernel to different disk sectors, requiring a rerun of LILO for the computer to be able to boot.) If you wish to run the "by-hand" scripts, add a "-b" to the end of the command line.

Suppose you've just made a change that needs to be applied to all the Deepsearch PC workstations (e.g. adding a user to the password file). In that case, there is a shortcut script you may run as root:

cd /home/steege/fcm
./fcm-ws -q
This is a perl script that sequentially runs fcm on each PC workstation. It can take quite a while to run. You might want to look at the script to make sure that it's up to date and has every workstation listed in the rhosts variable at the top.

There is a similar script called fcm-server that can be run to apply changes to all of the servers. Additionally, there is a nerscify-suns script used on the suns. Note that europa must always be nerscified with the -b command line switch.


nerscify Overview

nerscify is a lot like fcm. In fact, it is the predecessor of nerscify, and was also written by Rob Knop. fcm is better in a number of ways, but most of these are 'under the hood' improvements. For example, fcm tries to get away with a lot less copying of files than nerscify. Practically, the thing to know is that the redhat-7.1 machines use nerscify, and the redhat-7.3 and redhat enterprise machines use fcm. Thus, you need to do any updating of passwords and things like that using both systems. Hopefully, all of the redhat-7.1 machines will be updated to 7.3 someday, and we can retire nerscify completely. One thing to keep in mind with the 7.1 machines, and therefore with nerscify, is that they use lilo as a boot loader. Thus, any time there is the potential for a kernel to be moved around slightly it is extremely important that lilo be rerun. Fortunately, there is a switch in nerscify to do exactly this (actually, there is also one in fcm, but it's really never used).

% cd /home/steege/nerscify
% ./nerscify2 -h
machine_name -q -l
The "-l" switch tells nerscify to run LILO after it's done with all its customizations. This is usually a good idea, because one of the things that usually gets installed is a kernel. (Even if its the same kernel as was already there (e.g. from a previous iteration of nerscify), it's possible that nerscify will copy the kernel to different disk sectors, requiring a rerun of LILO for the computer to be able to boot.) Currently only zacharys, panisse, and the suns are managed by nerscify.


Updating Red Hat 7.1 or 7.3


All of this must be done as root, and generally in and around the directory /home/steege/src/rh-cd-7.1 and /home/steege/src/rh-cd-7.3

Updating RedHat Enterprise WS 3

The Enterprise systems are handled a little differently because of the fact that they are licensed.  Basically, the lab maintains a license manager and
RPM update system that we update against, instead of maintaining our own package tree.  On the other hand, the updating is pretty easy.
Simply

up2date -u

on the machine to be updated. As of 2004/04/15 there are only 3 rhes machines.


Dealing with RPMs

All of this must be done as root, and generally in and around the directory /home/steege/src/rh-cd-version


Kernels

Compiling new kernels for the SCP Linux machines involves a few additional steps from what you might be used to doing on your home system. This is because new kernels, system maps, kernel module libraries, and LILO configurations are stored as part of our /home/steege/fcm hierarchy.

Kernel source is stored under /home/steege/src

When downloading new kernel source, untar the source into the /home/steege/src directory. First rename the existing /home/steege/src/linux directory to something else to avoid overwriting existing source.

Next apply any needed or relevant patches. Currently the only long-standing patch that we have is the patch for the RaidZone IDE system we currently have on our main home file server, Zacharys. As of this writing (14 December 2001) these patches only exist for the 2.2 series of Linux kernels.

Configuration files are stored in /home/steege/src/configs. The files in this directory contain the basic configuration information appropriate to each type of machine, single-processor, dual-processor, PII, PIII, etc. Copy the appropriate configuration file to the newly created /home/steege/src/linux directory which contains the source you're interested in. Modify the EXTRAVERSION entry in the Makefile to reflect the name of the kernel if you want.

For each of the classes of machines (single, smp, rz, norz, and laptop) do the following

cd /home/steege/src/linux
make menuconfig

Alternatively you might want to use xconfig:

cd /home/steege/src/linux
make xconfig

Load the appropriate configuration file using the menu command. Then go through all of the options to make sure that things make sense and to double-check that the important things got loaded (e.g. ReiserFS support for Lococos, RaidZone if you're recompiling a 2.2 series kernel for Zacharys)

After finishing and exiting the menu, it's time to compile the kernel. Note that on a multi-processor machine you might want to use the -j3 option for make.

make dep
make clean
make bzImage
make modules

Installing the modules libraries, kernel and System.map file is a little different because we want to install the modules in appropriate place in our /home/steege/nerscify hierarchy.

First the modules. Change SYSTEMTYPE to the appropriate type: single, smp, rz, norz, or laptop. [The 2.4.14 tag is an example, replace with the kernel version number of the kernel you are compiling.]

make INSTALL_MOD_PATH=/home/steege/fcm/custom/SYSTEMTYPE/install/ modules_install

Next copy the kernel image and system map file to the appropriate places, using the exact same names as you did above.

cp arch/i386/boot/bzImage /home/steege/fcm/custom/SYSTEMTYPE/install/boot/vmlinuz-2.4.14
cp System.map /home/steege/fcm/custom/SYSTEMTYPE/install/boot/System.map-2.4.14

Next edit the /boot/grub/grub.conf file for the appropriate system type.

vi /home/steege/fcm/custom/SYSTEMTYPE/install/boot/grub/grub.conf

Create a new entry for your kernel. Make sure the label doesn't conflict with an existing label, and don't set it as the default until you've tested it on a number of different machines.

Next time the machines are nerscified, the new kernels, kernel module libraries and system map file will be automatically copied over and installed. It's a good idea to try some of them out at this point by nerscifying an individual machine and seem if it comes up under the new kernel and if everything works (ethernet, local filesystems & NFS, X, loop device, etc.).


Automount

We've got lots and lots and lots of disks. It would be horribly cumbersome to mount all the NFS disks at boot time; that would doubtless cause all sorts of other problems such as hanging during boot, etc. At boot time, all that is mounted are the local disks (usually /, /usr, /scratch and local /home/astro* disks) and the most important system disks (/home/steege and /usr/local). Everything else is mounted by the automounter when somebody accesses it.

The automounter watches the directory /autofs. However, most users will want to access the automounted directories by going through /home. There should be a link in /home for every disk mounted by the automounter. This is transparent to most users. Only the system administrator has to worry about it, and it isn't even very hard for her.

The file /etc/auto.home is a list of all automounted disks, and where they are automounted from. For this to work, of course, the computers where they are mounted from must be exporting the proper filesystems. /etc/auto.home lists disks mounted both from Deepsearch PCs and Deepsearch Suns. Don't worry if local disks are listed in /etc/auto.home; the automounter will ignore them, and anyway the directory in /home will be the local mount point for the filesystem rather than a link to /autofs. (This is why you shouldn't get in the habit of referring to paths starting with /autofs, because they won't work on every system.) The same auto.home file is used for every SCP PC Workstation (with the exception of any laptops).


Other Miscellaneous Info (Passwords, etc.)

To be written.


The Mcdonalds Cluster

To be written.


Backups

There are three things that are backed up on a regular basis on our systems. First, the postgrez database is backed up to a DAT on ajanta every night at about 3:30 am. Second, astro10 and astro9 are alternately backed up every few nights to the DLT drive on zacharys. All three sets of backups are run by crontab jobs.

Postgres

The DAT tape should be changed every morning to the next tape in sequence. There are two boxes of DAT tapes on the desk that should be alternated between on a weekly basis. The success of the backup can be checked by

  1. Log into ajanta
  2. % cd ~postgres/backup
  3. % tail backuplog

    Check to see if the backup from last night was completed.
  4. Examine backup.out. In particular, look for any error messages.
  5. If both of these check out, write the date of the backup on the tape container.
If the crontab file is somehow lost, you need to log on as postgres and add it again. The command in the crontab file should eventually look like:

30 3 * * 1,2,3,4,5 /usr/bin/perl -w /home/db/postgres/backup/backupdata.perl >
/home/db/postgres/backup/backup.out 2>&1


Remember -- never edit the crontab file directly!

astro10 and astro9

Only root can log into zacharys, so all of this will have to be done as root. There is a backup log which should be located on top of zacharys which should be initialed on the success of any backup or the insertion of another backup tape. The log also tells you when to insert new tapes, and which tapes to insert. There are two sets of 8 tapes (tapeset A and tapeset B). Ideally, the one not in use should leave the lab completely to minimize the loss in a catastrophe. Failures should be noted on the log. The date of any sucessful backups should also be noted on the plastic tape container for that tape. To check the success of the most recent backup:

  1. ssh -lroot zacharys
    cd backup

  2. Examine the logfile with

    tail backuplog

  3. Check the size of the backup_astro?.out file (where ? is 9 or 10 depending which backup you are checking):

    ls -l backup_astro?.out

    The size of the file should be about the same size as the usage of the disk (under cpio). This can be checked with df. If dump is being used the .out file will be much smaller.
  4. If dump is being used, examine the .out file directly. If cpio or afio has been used, the file is too large to examine manually.

To generate a new log, do this:

  1. % /home/lilys/rknop/admin/makebackuplog.perl date > log.tex

    Where date is the date in YYYY MM DD format of the first monday not covered by the previous log.
  2. % latex log.tex

  3. % dvips log.dvi

The crontab entries look like this:

30 22 * * 3 /usr/bin/perl -w /root/backup/backup /home/astro10 > \
/root/backup/backup_astro10.out 2>&1
30 22 * * 5 /usr/bin/perl -w /root/backup/backup /home/astro9 > \
/root/backup/backup_astro9.out 2>&1

Other backups

There are a few other disks that it is good to back up occasionally. astro1 should be backed up every few months, as well as any private disks you feel like protecting.


Mail on the deepsearch PCs

Currently, the mail server runs on zacharys. All of the other deepsearch machines have an MX entry that forwards mail to zacharys. Zacharys sticks incoming mail on /var/mail, which is exported to all of the other computers. Outgoing mail is handled by individual calls to sendmail, but only zacharys runs the sendmail daemon.

If you want to add a new computer and have it forward mail to zacharys, you need to talk to the LBL networking people and have them add a MX record for the new machine. You then need to set up zacharys to recieve mail that has been targeted for that machine. The file to edit (and install with nerscify) for zacharys is /etc/mail/local-host-names. Just add the new machine name in the same fashion as the other entries.

The file that actually controls how sendmail is run on every machine is /etc/sendmail.cf. If you are sufficiently masochistic you could try to edit this file directly using the usual nerscify method, but fortunately there is a better way.

The sendmail.cf can be configured by building from .mc files, which are considerably easier to read and understand. There are two .mc files that are used on our computers. They both reside in /usr/share/sendmail-cf/cf/ on panisse. panisse.mc controls how panisse deals with mail, and scp.mc controls how the other computers deal with mail. Essentially, scp.mc tells the other computers to re-wrap any outgoing mail that they send so that it looks like it comes from panisse, and panisse.mc tells panisse how to handle incoming mail. For information on how to build sendmail.cf from these files, consult the sendmail web page. The right directory to be working in is /usr/share/sendmail-cf/cf


Procedure for Adding/Editing a System File

...was covered in the Fcm section.



Procedure for Changing an IP

Sometimes, for whatever reason, you need to change the IP of an already existing computer. Here I will assume that you aren't changing the name, just the IP.

  1. Get the IP changed by talking to the lab folks: ip-request@lbl.gov
  2. Reboot the computer in single user mode by adding single to the boot line in grub.
  3. Edit /etc/sysconfig/networking and /etc/sysconfig/network-scripts/ifcfg-eth0 (assuming that the ethernet device is eth0) and /etc/hosts to reflect the new address.
  4. Reboot the machine
  5. Remove the old IP address and add the new one in both places in the hosts.allow file in fcm /home/steege/fcm/custom/pc/install/etc/hosts.allow
  6. Do the same under nerscify if necessary
  7. Update the ssh key to refer to the new address in /home/steege/fcm/custom/default/install/etc/ssh/ssh_known_hosts*. Again, do the same on nerscify
  8. Edit pg_hba.conf for lilys: /home/steege/fcm/custom/lilys/install/home/db/postgres/data/pg_hba.conf
  9. fcm and nerscify all of the machines using the appropriate scripts
  10. Go around to all of the servers and do exportfs -r
  11. Restart the postgres server on lilys: /etc/rc.d/init.d/postgresql restart

Procedure for Adding a Disk

If you are going to add a disk to one of the PCs, you must take the following steps. Most of them (everything but playing with hardware) must be done as root. These instructions are specific to installing SCSI disks. Furthermore, these instructions apply to ext2/ext3 filesystems.

  1. Install the hardware. I won't say much more about this, but it often isn't as easy as all that. You should worry about SCSI termination, proper SCSI3-SCSI2 conversions, SCSI cable lengths, SCSI addresses, and numerous other things.

  2. If you add a disk with SCSI ID lower than any disk that's already on the system, lots of things are going to break. Adapt your fcm configuration files to reflect the new SCSI bus. I will say no more, because if you get this far you'd better know what you're doing (or the damage you've done is only beginning).

  3. Format the disk. Start with "fdisk" to partition the disk. Be very careful, or you will delete vital Deepsearch data! (I.e., make sure you're running it on the right SCSI device.) Then run "mke2fs" to make a filesystem. You should specify lots of extra options to "mke2fs" in order to make the disk maximally efficient. (Hints: we often have big files, so we don't need too many inodes; also, the system administrator does not need 5% of a 18GB disk set aside for his use.)

    Example:

    mke2fs -j -T largefile -m 1 /dev/something

  4. Choose a name for the disk. This is usually /home/astro*, where the * is a number which hasn't already been used by any of the other /home/astro* disks on any of the Deepsearch Suns or the Deepsearch PCs. All further examples below will assume you've chosen the name /home/astro666.

  5. Add the disk to the machine's fstab. Edit

    /home/steege/fcm/custom/machine_name/install/etc/fstab
    This requires you to know what you're doing.

  6. Create the mount point for the disk. If the disk is to be /home/astro666, you would do this with the commands:

    % mkdir /home/astro666
    % mkdir /home/steege/fcm/custom/
    machine_name/install/home/astro666

  7. Export the disk. Edit:

    /home/steege/fcm/custom/machine_name/install/etc/exports
    This requires you to know what you're doing.

  8. Put the disk in the automounter for all the other PCs. Edit:

    /home/steege/fcm/custom/pc/install/etc/auto.home

  9. Make the automounter link for all of the other machines. Again, if your disk is /home/astro666, you would do this with the commands:

    % cd /home/steege/fcm/custom/pc/install/home
    % ln -s /autofs/astro666 astro666

  10. fcm the machine upon which you are mounting the disk.

  11. Mount the disk (for now) by hand:

    % mount /home/astro666
    Ideally, you won't ever have to do this again, because the disk will be mounted automatically the next time the system starts.

  12. Do "df", look at the disk, cd to it, generally make sure things are OK.

  13. Manually export the disk:

    % /usr/sbin/exportfs -r
    This will never be necessary again, for henceforth it will automatically be done when the system starts.

  14. Fcm the rest of the PCs (using fcm-ws on as described above).

  15. If the new disk is a deepsearch data disk, it needs to be added to the DEEPIMAGEPATH. This is set by editing /home/lilys/idl/idl_setup.csh and /home/lilys/idl/idl_setup.sh and/or one of the other versions having to do with the nearby or snfactory setups.

  16. Add the disk to the Sun automounter in auto_direct.


Handling RAID

A RAID array is a set of disks which have been combined together into one larger virtual disk. Typically this includes some sort of redundancy, so that you can have a disk die without actually losing any data. Usually, this means that the size of the larger virtual disk is less than the sum of the sizes of the disks that comprise the array. For example, on rustica, /dev/md0 is a RAID-5 array of four 50GB disks. The size of /dev/md0 is about 150GB (3x50GB).

Mostly we use software RAID, but currently there are two computers with some form of hardware IDE raid: zacharys and drago. Zacharys uses a RAIDZone product, which unfortunately has not been very well supported. Information about the status of the RAIDZone can be found by logging onto zacharys and /usr/sbin/raidzone. The hardware RAID on drago is based on a 3Ware Escalade 8506 SATA card, which should have better support. Information about the status of this raid can either be obtained by rebooting drago or by logging onto drago and using a web browser to access http://localhost:1080/ This web page is only available from drago.

Software RAID is handled by the kernel, and most of the time doesn't require much thought. Any SCSI disks we have in RAID arrays are managed by the Linux kernal software RAID system. You can treat the /dev/md devices just like any other device. However, you must take some care when handling disks that are part of a RAID aray. Some points of order:

The status of a software raid array can be found by examining the /proc/mdstat file on the machine it is mounted on:

% cat /proc/mdstat
Hopefully you will see all U-s. If you don't, one (or, God forbid, more) of the disks have failed. If only one disk has failed, the array should still be functional (we only use RAID-5 here), but it is imperative that you replace the non-functional disk first. Refer to the Software RAID HOWTO, but here are roughly the steps you should follow if only one disk has failed. If more than one has failed, you are in considerable trouble. Consult the HOWTO for minimal advice.

Ideally, my SCP Disk Usage Page should list which PCs are using RAID arrays, and which disks on those PCs are going into the RAID arrays.


Procedure for Adding a User

Don't believe anything you read in the Red Hat manual or elsewhere about how easy it is to add a user using some "adduser" script or some GUI utility. It will generally be harder to massage those things into something that fits with the Deepsearch PC setup than it will be to just do it the way it used to be done when men were real men, women were real women, and sysadmins were real geeks.

  1. Figure out the UID for the user. If the user is going to also have a Sun account, everybody's life will be MUCH happier if you give her the same UID on the PCs as she will have on the Suns. If she doesn't have her Sun account yet, you can give her a temporary UID, but once she gets her Sun account you're going to have to change the UID and the UID on all the files they've created. So just pick a high number (as of this writing, 5 March 2001, we're in the 20,000s) and make it the same on both systems. You can often refer to the previous account created (generally the last line in the 'passwd' file) for a good place to start.

    Do the following steps as root on zacharys:

  2. Mount /home/steege on zacharys (/home/steege lives on Rustica and by default is not mounted on the other files servers to avoid NFS hangs if a machine goes down. So, please, after you're through, "umount /home/steege" as it says below.)

    zacharys% mount /home/steege
  3. Add the user to the password files. Edit:

    /home/steege/nerscify/custom/pc/install/etc/passwd
    Give her GID 4028 (deepsrch). Make her shell /bin/tcsh unless she whines and wants something different. Make her home directory /home/lilys/username, unless there is a /home/machine_name directory for the workstation on her desk, in which case you might want to put her home diretory there. (Use your brain.) Put "x" in the password field of the passwd file.

    Edit:
    /home/steege/nerscify/custom/server/install/etc/passwd
    and do the same as above except make the shell /bin/false to disable logins.
  4. Add the user to the shadow password file. Edit:

    /home/steege/nerscify/custom/pc/install/etc/shadow
    /home/steege/nerscify/custom/server/install/etc/shadow
    Put "*" in the password field.

  5. Add the user to the phyd group in the group file:

    /home/steege/nerscify/custom/pc/install/etc/group

  6. Nerscify zacharys (/home/lilys is currently on zacharys)

    zacharys% cd /home/steege/nerscify
    zacharys% ./nerscify2 -h zacharys -q -l
  7. Umount /home/steege from zacharys. You must first change to a directory not on /home/steege

    zacharys% cd
    zacharys% umount /home/steege
  8. Create the user's home directory in /home/lilys. Run chown and chgrp so that she owns her directory and set the directory permissions:

    zacharys% cd /home/lilys
    zacharys% mkdir (username)
    zacharys% chown (username):deepsrch (username)
    zacharys% chmod 755 (username)

    Do the following steps as root on sauls (or any other non server):

  9. fcm sauls so that it is aware of the new user. See the instructions above for zacharys

  10. Use "cd" to change into the ~username directory.

  11. Do "su - username".

  12. Install standard base startup files:

    % cp -p /etc/skel/* ./
    % cp -p /etc/skel/.* ./

  13. Do "passwd username" as root to set a default password for the user. This has the bonus side effect that it will copy the password file to all of the other PC Workstations. You're still on sauls, right? This step really needs to be done from a workstation machine because they mount '/usr/local', where the appropriate passwd.scp command resides which automatically keeps the password files synchronized.  I like to make the initial password really nasty to increase the chance that the user will change it quickly.

  14. fcm all machines.

    For the servers:
    sauls% cd /home/steege/fcm
    sauls% ./fcm-server -q
    And now for the workstations
    sauls% cd /home/steege/fcm
    sauls% ./fcm-ws -q -l
    In case you're wondering, the fcm-server script automatically mounts and unmounts /home/steege/ as necessary.
  15. Do "exit" to get out of the su.

  16. Tell the user to change her password, and nag her until she does.


Procedure for Deleting a User

  1. Edit:

    /home/steege/nerscify/custom/pc/install/etc/passwd
    /home/steege/nerscify/custom/server/install/etc/passwd
    If you wish to disable the user's account, set her shell to /bin/false. If you wish to delete the account, comment out the line in the file by adding a "#" at the beginning.

  2. Edit:

    /home/steege/nerscify/custom/pc/install/etc/shadow
    /home/steege/nerscify/custom/server/install/etc/shadow
    If you are disabling the user's account, set the password field to "*" (without the double quotes). If you are deleting the user's account, comment out the line in the shadow file by adding a "#" at the beginning.

  3. Delete the user's home directory. Warning: You may not really want to do this! The user may still have files that the Deepsearch needs! In any event, it is generally considered polite to back the directory up to a tape first. (Use a 2GB 90m DDS tape, as those are the cheapest, if we're talking less than 2GB of data.)

  4. Find all other random directories left over on all Deepsearch disks in user and scratch partitions loaded with useless files that nobody else is ever going to care about, and delete them. This is a lot of work. Feel free to procrastinate on it for months. Again, it's polite to back up any directories you delete before actually deleting them.


Procedure for Installing a New 7.3 Workstation

  1. The first step is to install the actual hardware. It is assumed that you know how to do this. If this is a new machine with a name that isn't currently registered to us, you need to let the lab networking folks know you need an IP address. It is no longer necessary to contact distributed printing about a new machine.

  2. Updating the CD image

  3. Update the kickstart image

  4. Use nerscify and fcm to add the address of the new machine to the hosts.allow files. Try /home/steege/nerscify/custom/pc/install/etc/ and /home/steege/fcm/custom/pc/install/etc/ Also make sure to edit the custom files for the severs. Add the IP address (if new) to the file pg_hba.conf on Lilys using fcm. Then add the new machine to the .shosts file: /home/steege/nerscify/custom/pc/install/root/.shosts and /home/steege/fcm/custom/pc/install/root/.shosts. Add the name of the new machine to the export lists in nerscify.conf and fcm.conf. Once these are updated on rustica and panisse, ssh to rustica and re-export the filesystem:

    [root@rustica /root]/usr/sbin/exportfs -r

  5. Put the floppy in the new computer and boot it. With luck, everything will work. If not, good luck. I recommend the following partitions:

    / 1024MB
    /usr 4096MB
    swap 2048MB
    /scratch Remaining space

  6. Modify the fcm-ws script to include the new computer, so that you can use nerscify on it

  7. Rebooting and customizing

  8. Configure Xwindows

    Edit /etc/X11/XF86Conf-4 file and add additional display modes as desired. Then copy the file to the fcm directory

    cp -p /etc/X11/XF86Conf-4 /home/steege/fcm/custom/< machine >/etc/X11/

  9. Adjust RPMS

    Sadly, at the moment the installation procedure doesn't quite work correctly under RedHat 7.1. Some of the RPMs don't get installed correctly. There are a few things that need to be done by hand. Details can be found here. If you aren't installing a 7.1 system, you can ignore this.

  10. Deal with astro disks

    If the new machine exports any /home/astro disks, youwill need to follow the procedures for adding a new disk if you want the other computers to be able to automount it.
  11. Add to passwd.scp

    Add the name of the new machine (if it is a new machine, and not a replacement for an old one with the same name) to /usr/local/bin/passwd.scp. This must be edited on rustica or lilys. Note that this file does not seem to be currently handled by nerscify.

  12. Mail

    If this machine is not a replacement for an old machine (i.e., it has a brand new, not previously used name), ask the lbl networking people to add a MX record to the DNS tables for the new machine forwarding all mail to panisse. Then, edit the local-host-names file as described above under mail to get panisse to accept mail addressed to the new machine. In theory, the byhand part of the nerscify script will install the appropriate sendmail.cf, but you should check.

  13. Web stuff

    If this is a new machine, the computer should be added to the .htaccess lists on panisse (/home/panisse/www/htdocs/groupwork/collab/.htaccess) and lilys (/autofs/astro47/httpd/html/deepdb/.htaccess). Neither of these files is handled by fcm/nerscify.


Procedure for Installing a New RHEL 3 Workstation

  1. Again, start by installing the relevant hardware
  2. Make sure you have a licence with activation key. If a new license is needed, contact licenses.lbl.gov.
  3. The kickstart file
  4. Boot off of the boot CD. At the prompt type

    linux ks=(URL of ks file)
  5. I recommend the following partitions (assuming that no /home/astro type disk will be created on this computer):

    / 1024MB
    swap 2048MB
    /usr 5120MB
    /var 3096MB
    /scratch Remaining space

  6. Register the licence using the instructions that the lab provides. I would recommend against disabling automatic kernel updates.
  7. Update using up2date -u.
  8. Install all of the RPMS in /home/steege/pkg/redhat-es3/i386. This may require some --force action.
  9. Set up fcm for the new computer and then run it.

Procedure for Restoring Lilys

To be written.


Procedure for Restoring a Workstation

To be written.


Boot Floppies

If you need to make a new boot floppy, do the following:

Insert a floppy.

cd /home/steege/rescue
dd if=emer_boot of=/dev/fd0
/usr/sbin/rdev -r /dev/fd0 49152
/usr/sbin/rdev /dev/fd0 /dev/fd0

Insert another floppy

dd if=emer_root.gz of=/dev/fd0

Insert another floppy

dd if=emer_util of=/dev/fd0



Maintained and written by Alex Conley (AJConley@lbl.gov).