The Deepsearch PC System Administration Guide

All of the deepsearch PCs are administered by me, Rob Knop. I've set them up so that they all have a uniform system administration environment. The idea is to make it easier on myself, so that I am administering a set of identically configured PCs, rather than having to adminster nine or ten individual workstations. This is the reason I don't give everybody the root password, and allow them to configure their own systems to their heart's content. If somebody really must do that, then they must also detach it from the rest of the Deepsearch and make it a standalone machine for which I am not responsible.

Contents:

Conceptual Overview
Types of PCs & Operational Requirements
Future plans
fcm overview
Nerscify Overview
Updating Red Hat
Dealing with RPMs
Kernels
The Automounter
Other Miscellaneous Info (passwords, etc.)
Backups
Mail
Procedure for Adding/Editing a System File
Procedure for changing IP address
Procedure for Adding a Disk
Handling RAID
Procedure for Adding a User
Procedure for Deleting a User
Procedure for Installing a New 7.3 Workstation
Procedure for Installing a New RHEL 3 Workstation
Procedure for Restoring Lilys
Procedure for Restoring a Workstation
Boot Floppies
Shutting Down the Deepsearch Computers

Conceptual Overview

The PCs are set up so that it is simple to administer a large number of them. As you read this manual, you are likely to think that I've made things unbearably complicated. "Why not just let me go in and edit the system file I need to edit?" you will cry. Bear with me. If I let you do that, then I'd be trying to keep track of n individual chaotically configured PCs. The overhead in the system I use makes it harder than managing one PC directly, but makes it much easier than managing nine or even six individual PCs.

Each PC workstation has the most essential system files stored locally. (This doesn't mean that any PC will fully function in the absense of the others, because things like home directories are shared.) It is a stock Red Hat system, modified by those changes to system files needed to adapt it for the environment we desire for SCP usage. We use the "fcm" system to handle installing customizations to a base Red Hat install. This serves two purposes. One, it makes it possible to see exactly which files have been modified from the standard Red Hat install, and hence quick to reapply them in the event that Linux needs to be reinstalled from scratch on a workstation. Two, because many of the modifications will be identical across all of the workstations, it saves a lot of hassle and work to store the modifications in a central location.

The SCP computers were formerly managed through a system called nerscify, which is very similar to fcm from a practical standpoint. As things currently stand, the nerscify system manages only the few remaining RedHat 7.1 systems, while fcm manages the RedHat 7.3 and RedHat Enterprise WS 3 systems. Hopefully we will be able to eliminate the 7.1 systems soon and get everything onto fcm.

So, in short: every system file which is added to or different from those files installed by a stock Red Hat installation is stored in the fcm or nerscify database.

Types of PCs & Operational Requirements

All of the deepsearch PC's are designated as either "Workstations" or "Servers." Servers are not available for user login, workstations are. The servers are configured such that (for the most part) they can boot independently of any of the other computers. This is not true of the workstations.

Workstations

Workstations are computers available for direct use. Either you can sit down at their console and use them, or you can connect to them using ssh and run programs directly on them. They are all administered under the aegis of fcm or nerscify. For a computer to be a Linux workstation administered by us, it must satisfy a few requirements:

It must be on the network at LBL
It must be under the aegis of Nerscify or fcm, with the same password file and security configuration as the other Deepsearch PCs
It must be online and running Linux all the time. OK, computers crash, and sometimes you have to turn them off briefly to physically move them. However, I will not administer systems which are dual-boot Windows sytems, only online running Linux a fraction of the time. Having systems which I'm supposed to be administered which I can't connect to at any given time breaks a lot of the automatic procedures that keep the systems synchronized with each other, and greatly increases the headaches in administering a large number of computers.
Exception to the above: laptops. Obviously, laptops aren't going to always be online and be on the LBL net. However, I use fcm as a convenience for managing those systems as well. They are a special case, however, and aren't as integrated into the whole system as well as the other computers.
It needs to have a fixed IP address. Our security model is based around fixed addresses. Laptops are again an exception to this.

Future Plans

Here are some things I would like to do with the machines:

Retire the Redhat 7.1 Machines. There are two right now: panisse and zacharys. Zacharys is a problem because its hardware IDE RAID (a RAIDZone product) appears to be incompatible with more recent versions of RedHat. Note that this isn't simply that there are no Linux-2.4 Kernel patches for RAIDZone, but that there seems to be some incompatibility with grub.
Switch the remaining RedHat 7.3 machines over to RedHat Enterprise.
Replace the password scheme with something better, most likely LDAP.
Figure out what is the fate of our mail server. Should it become a permanent repository?

fcm Overview

fcm is a tool which keeps a database of new and modified system files, and which copies them into the running system directories when so instructed. The database is organized hierarchically. You define "classes" of configurations; each class includes a set of files to install into the system directories. The class structure is hierarchical. The class structure for the Deepsearch PCs as of 2004/04/20 looks like:


#                                     |--pivht--|--plearn
#                                     |         
#                                     |
#                                     |--single--|--cactus
#                            |--work--|          |--dara
#                   |--rh73--|        |          |--vics
#            |--pc--|        |        |          
#            |      |        |        |
# default -- |      |        |        |--smp--|
#            |      |        |                |--barneys
#            |      |        |                |--cha-am
#            |      |        |                |--filippos
#            |      |        |                |--flints
#            |      |        |                |--jimbean
#            |      |        |                |--lavals
#            |      |        |                |--milano
#            |      |        |                |--paloma
#            |      |        |                |--sauls
#            |      |        |                |--spats
#            |      |        |                |--sierra
#            |      |        |                |--venezia
#            |      |        |
#            |      |        |
#            |      |        |
#            |      |        |                   |--kirin
#            |      |        |                   |--lilys
#            |      |        |          |--norz--|--lococos
#            |      |        |          |        |--
#            |      |        |          |        |--rustica
#            |      |        |          |        |--spengers
#            |      |        |          |
#            |      |        |          |
#            |      |        |--server--|
#                   |        |          |
#                   |        |          |
#                   |        |          |--rz--|
#                   |        |
#                   |        |--snfactory--|--jsbach          
#                   |                      |--wfbach
#                   |
#                   |--rhes3--|--ajanta
#                             |--oscars
#                             |--skates
#                             |--topdog

Consider for example the machine barneys. When you run fcm on barneys, it will first install the configuration files for the "default" class, then the "pc" class, then the "rh73" class, then the "work" class, the "smp" class, and finally for the "barneys" class. Although this seems horribly complicated, it is actually fairly convenient. Those files (e.g. the password file and the automounter configuration file) which are shared between every SCP PC workstation may be put in the default class. Those files shared by every RedHat 7.3 workstation may be put in the rh73 class. Those files shared by only the smp machines (e.g. the smp aware kernel) may be put in the smp class. Finally, system files specific only to barneys (e.g. disks mounted and exported) are put within the barneys class.

If the same file appears in both a higher and a lower class, unless the first line of the file is "@APPEND@", the file in the lower class will overwrite the file in the higher class. So, for example, if there was an entry in the fcm database for the file /etc/aluminum_pipe for both the default and barneys classes, the version from the barneys class is what would end up installed on barneys. (If the first line of an entry in the fcm database is "@APPEND@", then the contents of that file are appended to whatever is already present on the system.)

/home/steege/fcm/custom/class/install/

that has all the customized files for that class. So, if you wanted to change /etc/aluminum_pipe for all of the Deepsearch PCs, you would edit the file:

/home/steege/fcm/custom/default/install/etc/aluminum_pipe

In many cases, the file will already exist in the fcm area. Suppose, however, that /etc/aluminum_pipe is installed with Red Hat, and that it hasn't been modified before. If you now want to modify it, you will have to suck it into the fcm area. This is quite simple. First do:

% cp -p /etc/aluminum_pipe /home/steege/fcm/custom/default/install/etc/

and then edit the file.

Note that things can be subtle. It may be that there is an entry for /etc/aluminum_pipe in both the "default" class and the "rh73" class. The "default" class version would apply to every machine except for the RedHat 7.3 machines. You may then go in and edit the file for the "default" class, but after running fcm see that the file /etc/aluminum_pipe on your system hasn't changed. All of this makes sense if you really understand fcm. The rules are: know what you're doing, and tread with caution. If you don't know what you're doing, just let the sysadmin do it. If you are the sysadmin and you don't know what you're doing, then learn what you're doing, and quick.

The `fcm.conf` file

To be written.

Variable Substitutions

To be written.

"By-Hand" Scripts

fcm mostly works by installing customized files. Sometimes this doesn't entirely work, and there may be a few additional steps you need to take to turn a stock Red Hat system into a happy SCP system. These steps are stored in shell scripts in /home/steege/fcm/byhand. Take a look at what's there, and hopefully you will understand how they work. Normally the byhand scripts aren't something you need to run over and over again.

Running fcm

fcm must be run as root.

First make sure that any emacs backup files have been cleaned out of the fcm database. We don't want to go spamming the workstations' system directories with those. You may do this with the following three commands:

% cd /home/steege/fcm/custom % find . -name "*~" -not -path "*dist*" print % find . -name "*~" -not -path "*dist*" -ok rm \{\} \;

If you see something other than what you expect after the second command, stop and use your brain.

Now you may run fcm for the machine you are on. Do this as follows:

% cd /home/steege/fcm % ./fcm -hmachine_name -q

The "-l" switch tells nerscify to run LILO after it's done with all its customizations. This is usually a good idea, because one of the things that usually gets installed is a kernel. (Even if its the same kernel as was already there (e.g. from a previous iteration of nerscify), it's possible that nerscify will copy the kernel to different disk sectors, requiring a rerun of LILO for the computer to be able to boot.) If you wish to run the "by-hand" scripts, add a "-b" to the end of the command line.

Suppose you've just made a change that needs to be applied to all the Deepsearch PC workstations (e.g. adding a user to the password file). In that case, there is a shortcut script you may run as root:

cd /home/steege/fcm
./fcm-ws -q

This is a perl script that sequentially runs fcm on each PC workstation. It can take quite a while to run. You might want to look at the script to make sure that it's up to date and has every workstation listed in the rhosts variable at the top.

nerscify Overview

nerscify is a lot like fcm. In fact, it is the predecessor of nerscify, and was also written by Rob Knop. fcm is better in a number of ways, but most of these are 'under the hood' improvements. For example, fcm tries to get away with a lot less copying of files than nerscify. Practically, the thing to know is that the redhat-7.1 machines use nerscify, and the redhat-7.3 and redhat enterprise machines use fcm. Thus, you need to do any updating of passwords and things like that using both systems. Hopefully, all of the redhat-7.1 machines will be updated to 7.3 someday, and we can retire nerscify completely. One thing to keep in mind with the 7.1 machines, and therefore with nerscify, is that they use lilo as a boot loader. Thus, any time there is the potential for a kernel to be moved around slightly it is extremely important that lilo be rerun. Fortunately, there is a switch in nerscify to do exactly this (actually, there is also one in fcm, but it's really never used).

% cd /home/steege/nerscify % ./nerscify2 -hmachine_name -q -l

The "-l" switch tells nerscify to run LILO after it's done with all its customizations. This is usually a good idea, because one of the things that usually gets installed is a kernel. (Even if its the same kernel as was already there (e.g. from a previous iteration of nerscify), it's possible that nerscify will copy the kernel to different disk sectors, requiring a rerun of LILO for the computer to be able to boot.) Currently only zacharys, panisse, and the suns are managed by nerscify.

Updating Red Hat 7.1 or 7.3

All of this must be done as root, and generally in and around the directory /home/steege/src/rh-cd-7.1 and /home/steege/src/rh-cd-7.3

Look at mirror.redhat -- this contains stored definitions for sites that have mirrors of the RedHat FTP site (which is slow). In the Rob era, bnl was usually used, but apparently it's having trouble lately. It should be obvious how to add other mirrors. However, be very careful not to try to use a site that is an incomplete mirror, which can cause all sorts of problems
Once the mirror is updated correctly, log in as root on rustica and run

cd /home/steege/src/rh-cd-7.1 mirror -pupdates-bnl mirror.redhat ./cleanrpms
where the appropriate mirror name has been substituted for updates-bnl. This may or may not copy some files to the /home/steege/pkg/redhat-7.1/updates tree. The cleanrpms command is a hack to deal with some inconsistencies that have kept into the 7.1 tree because it is getting a little old and cluttered.
Once they are in, take a look at /home/steege/pkg/redhat-7.1/other-rpms/* and make sure that you don't have older versions of packages which are also in updates. In our system of deciding what we want to keep, other-rpms takes precedence over updates, so make sure that other-rpms doesn't have cruft in it.
Once you are satisfied with everything, run

./update-rpms
This will move the new things from updates and other-rpms into the main CD directory. The code should be pretty easy to understand -- if you really want to know, look at it to see where it copies files from and in what order.
Do the same but for 7.3. Note that the cleanrpms step is not necessary for the 7.3 tree.
Once everything is up to date, on each 7.1 PC (as of 2004/02/20 there are 2 -- zacharys and panisse. There are also 2 suns, europa and kirala) you need to

cd /home/steege/src/rh-cd-7.1 ./rpmupdater.perl --do
It's probably a good idea to try it on a few different machines without the --do to see if everything is happy. Remember, the servers (except, obviously, rustica) need to have /home/steege mounted and unmounted by hand during this process. A (hopefully) up to date list of the current machines can be found by examining /home/steege/nerscify/nerscify.conf
Now update the 7.3 machines (as of 2003/05/06 there are 26 (18 workstations, 6 servers, and 2 snfactory machines). This can either be done by

cd /home/steege/src/rh-cd-7.3 ./rpmupdater.perl --do
or by using the rpmupdateall script

cd /home/steege/src/rh-cd-7.3 ./rpmpudateall.py

Updating RedHat Enterprise WS 3

The Enterprise systems are handled a little differently because of the fact that they are licensed. Basically, the lab maintains a license manager and
RPM update system that we update against, instead of maintaining our own package tree. On the other hand, the updating is pretty easy.
Simply

up2date -u

on the machine to be updated. As of 2004/04/15 there are only 3 rhes machines.

Dealing with RPMs

All of this must be done as root, and generally in and around the directory /home/steege/src/rh-cd-version

Finding them
If you want to install an RPM which isn't currently on the system, or to update an rpm that isn't part of the standard RedHat updates tree, take the following steps to find the rpm.
1. Look at the CD directory to make sure it isn't part of standard RedHat. This directory is /home/steege/pkg/redhat-7.1/i386/RedHat/RPMS
2. Look at rawhide, on the redhat ftp site, located at ftp.redhat.com/pub/redhat/linux/rawhide/i386/RedHat/RPMS
3. Go to www.rpmfind.net Be careful not to download things packaged for other distributions like Mandrake. Always do rpm -qilp on the package to see where it's going to install stuff. If it's going to dump stuff all over /opt or /usr/local, don't use the RPM, but install by hand to /usr/local
4. Find it some other way. Good luck. Maybe you are only reading these instructions because you have an RPM in hand.
5. Pack your own rpm. Ten bonus geek points for doing this.
6. For Perl modules, you can usually run
  
  /home/steege/src/installer/perlmod2rpm (file).tgz
  on files downloaded from CPAN to get a nice Perl module RPM. That's where most of the rpms in /home/steege/pkg/redhat-7.1/other-rpms/i386/perl come from.
7. Mozilla can be found at
  http://ftp.mozilla.org/pub/mozilla/releases/*/Red_Hat_7x_RPMS/
  Where * is replaced with some appropriate release directory. It usually takes a few days after a Mozilla release for the rpms to be made. Just be patient. Eventually, it gets linked to from the "where to get Mozilla" page, but this is usually a few days after it becomes available.
8. Source rpms can be found on rawhide, or possibly from other sources.
Dealing with dependencies

Try running rpm --test -i (package).rpm on your new rpm (-U instead of -i if updating) to determine the dependencies. Hopefully there won't be many. Depending how many their are, and how deeply nested the process becomes, you may need to download additional rpms. At some point, it becomes easier to build your own from the source. See the instructions for installing source rpms below.
Storing them locally

Put your brand spanking new rpm in /home/steege/pkg/redhat-7.3/other-rpms/i386, unless some subdirectory thereof makes more sense (i.e., perl). If older versions of the rpm are in that directory, remove them (don't touch the CD directory of the updates tree). This has to be done a lot.
Updating the CD tree

Run /home/steege/src/rh-cd-7.3/update-rpms on rustica as root to get the rpm in the cd directory
Update the kickstart file

If this is a new RPM, and you want it to be installed on new systems and during reinstallations, you need to stick it in the kickstart file. The kickstart file is in /home/steege/src/rh-cd-7.3/ks.cfg . Add the name of the rpm to this file in the appropriate place
Install

Go to the cd directory /home/steege/pkg/redhat-7.3/i386/RedHat/RPMS and run rpm --test -i (package).rpm (or -U, as appropriate) on a few different machines to make sure everything is good. When you are satisfied, do the rpm -i or rpm -U on each and every pc.
Building source rpms

Sometimes the dependencies of a new rpm are way too much of a pain in the ass to deal with by upgrading everything else. In this case, you may want to build a rpm from the source. A source rpm is called a srpm. Find one and download it. Then rpm -i it on a fast workstation, then go to /usr/src/redhat/SPECS on that workstation and find the .spec file. rpm -ba the .spec file, and if all goes well you should create a rpm in the /usr/src/redhat/RPMS/i386 directory. Install this as usual. On the Enterprise machines there is no rmp -ba -- instead use rpmbuild.

Kernels

Compiling new kernels for the SCP Linux machines involves a few additional steps from what you might be used to doing on your home system. This is because new kernels, system maps, kernel module libraries, and LILO configurations are stored as part of our /home/steege/fcm hierarchy.

Kernel source is stored under /home/steege/src

When downloading new kernel source, untar the source into the /home/steege/src directory. First rename the existing /home/steege/src/linux directory to something else to avoid overwriting existing source.

Next apply any needed or relevant patches. Currently the only long-standing patch that we have is the patch for the RaidZone IDE system we currently have on our main home file server, Zacharys. As of this writing (14 December 2001) these patches only exist for the 2.2 series of Linux kernels.

Configuration files are stored in /home/steege/src/configs. The files in this directory contain the basic configuration information appropriate to each type of machine, single-processor, dual-processor, PII, PIII, etc. Copy the appropriate configuration file to the newly created /home/steege/src/linux directory which contains the source you're interested in. Modify the EXTRAVERSION entry in the Makefile to reflect the name of the kernel if you want.

For each of the classes of machines (single, smp, rz, norz, and laptop) do the following
cd /home/steege/src/linux make menuconfig Alternatively you might want to use xconfig:
cd /home/steege/src/linux make xconfig Load the appropriate configuration file using the menu command. Then go through all of the options to make sure that things make sense and to double-check that the important things got loaded (e.g. ReiserFS support for Lococos, RaidZone if you're recompiling a 2.2 series kernel for Zacharys)

After finishing and exiting the menu, it's time to compile the kernel. Note that on a multi-processor machine you might want to use the -j3 option for make.
make dep make clean make bzImage make modules Installing the modules libraries, kernel and System.map file is a little different because we want to install the modules in appropriate place in our /home/steege/nerscify hierarchy.

First the modules. Change SYSTEMTYPE to the appropriate type: single, smp, rz, norz, or laptop. [The 2.4.14 tag is an example, replace with the kernel version number of the kernel you are compiling.]
make INSTALL_MOD_PATH=/home/steege/fcm/custom/SYSTEMTYPE/install/ modules_installNext copy the kernel image and system map file to the appropriate places, using the exact same names as you did above.

cp arch/i386/boot/bzImage /home/steege/fcm/custom/SYSTEMTYPE/install/boot/vmlinuz-2.4.14 cp System.map /home/steege/fcm/custom/SYSTEMTYPE/install/boot/System.map-2.4.14
Next edit the /boot/grub/grub.conf file for the appropriate system type.

vi /home/steege/fcm/custom/SYSTEMTYPE/install/boot/grub/grub.conf

Create a new entry for your kernel. Make sure the label doesn't conflict with an existing label, and don't set it as the default until you've tested it on a number of different machines.

Next time the machines are nerscified, the new kernels, kernel module libraries and system map file will be automatically copied over and installed. It's a good idea to try some of them out at this point by nerscifying an individual machine and seem if it comes up under the new kernel and if everything works (ethernet, local filesystems & NFS, X, loop device, etc.).

Automount

We've got lots and lots and lots of disks. It would be horribly cumbersome to mount all the NFS disks at boot time; that would doubtless cause all sorts of other problems such as hanging during boot, etc. At boot time, all that is mounted are the local disks (usually /, /usr, /scratch and local /home/astro* disks) and the most important system disks (/home/steege and /usr/local). Everything else is mounted by the automounter when somebody accesses it.

The automounter watches the directory /autofs. However, most users will want to access the automounted directories by going through /home. There should be a link in /home for every disk mounted by the automounter. This is transparent to most users. Only the system administrator has to worry about it, and it isn't even very hard for her.

The file /etc/auto.home is a list of all automounted disks, and where they are automounted from. For this to work, of course, the computers where they are mounted from must be exporting the proper filesystems. /etc/auto.home lists disks mounted both from Deepsearch PCs and Deepsearch Suns. Don't worry if local disks are listed in /etc/auto.home; the automounter will ignore them, and anyway the directory in /home will be the local mount point for the filesystem rather than a link to /autofs. (This is why you shouldn't get in the habit of referring to paths starting with /autofs, because they won't work on every system.) The same auto.home file is used for every SCP PC Workstation (with the exception of any laptops).

Other Miscellaneous Info (Passwords, etc.)

To be written.

The Mcdonalds Cluster

To be written.

Backups

There are three things that are backed up on a regular basis on our systems. First, the postgrez database is backed up to a DAT on ajanta every night at about 3:30 am. Second, astro10 and astro9 are alternately backed up every few nights to the DLT drive on zacharys. All three sets of backups are run by crontab jobs.

Postgres

The DAT tape should be changed every morning to the next tape in sequence. There are two boxes of DAT tapes on the desk that should be alternated between on a weekly basis. The success of the backup can be checked by

Log into ajanta
% cd ~postgres/backup
% tail backuplog

Check to see if the backup from last night was completed.
Examine backup.out. In particular, look for any error messages.
If both of these check out, write the date of the backup on the tape container.

If the crontab file is somehow lost, you need to log on as postgres and add it again. The command in the crontab file should eventually look like:

30 3 * * 1,2,3,4,5 /usr/bin/perl -w /home/db/postgres/backup/backupdata.perl > /home/db/postgres/backup/backup.out 2>&1

Remember -- never edit the crontab file directly!

astro10 and astro9

Only root can log into zacharys, so all of this will have to be done as root. There is a backup log which should be located on top of zacharys which should be initialed on the success of any backup or the insertion of another backup tape. The log also tells you when to insert new tapes, and which tapes to insert. There are two sets of 8 tapes (tapeset A and tapeset B). Ideally, the one not in use should leave the lab completely to minimize the loss in a catastrophe. Failures should be noted on the log. The date of any sucessful backups should also be noted on the plastic tape container for that tape. To check the success of the most recent backup:

ssh -lroot zacharys
cd backup
Examine the logfile with

tail backuplog
Check the size of the backup_astro?.out file (where ? is 9 or 10 depending which backup you are checking):

ls -l backup_astro?.out

The size of the file should be about the same size as the usage of the disk (under cpio). This can be checked with df. If dump is being used the .out file will be much smaller.
If dump is being used, examine the .out file directly. If cpio or afio has been used, the file is too large to examine manually.

To generate a new log, do this:

% /home/lilys/rknop/admin/makebackuplog.perl date > log.tex

Where date is the date in YYYY MM DD format of the first monday not covered by the previous log.
% latex log.tex
% dvips log.dvi

The crontab entries look like this:

30 22 * * 3 /usr/bin/perl -w /root/backup/backup /home/astro10 > \ /root/backup/backup_astro10.out 2>&1 30 22 * * 5 /usr/bin/perl -w /root/backup/backup /home/astro9 > \ /root/backup/backup_astro9.out 2>&1

Other backups

There are a few other disks that it is good to back up occasionally. astro1 should be backed up every few months, as well as any private disks you feel like protecting.

Mail on the deepsearch PCs

Currently, the mail server runs on zacharys. All of the other deepsearch machines have an MX entry that forwards mail to zacharys. Zacharys sticks incoming mail on /var/mail, which is exported to all of the other computers. Outgoing mail is handled by individual calls to sendmail, but only zacharys runs the sendmail daemon.

If you want to add a new computer and have it forward mail to zacharys, you need to talk to the LBL networking people and have them add a MX record for the new machine. You then need to set up zacharys to recieve mail that has been targeted for that machine. The file to edit (and install with nerscify) for zacharys is /etc/mail/local-host-names. Just add the new machine name in the same fashion as the other entries.

The file that actually controls how sendmail is run on every machine is /etc/sendmail.cf. If you are sufficiently masochistic you could try to edit this file directly using the usual nerscify method, but fortunately there is a better way.

The sendmail.cf can be configured by building from .mc files, which are considerably easier to read and understand. There are two .mc files that are used on our computers. They both reside in /usr/share/sendmail-cf/cf/ on panisse. panisse.mc controls how panisse deals with mail, and scp.mc controls how the other computers deal with mail. Essentially, scp.mc tells the other computers to re-wrap any outgoing mail that they send so that it looks like it comes from panisse, and panisse.mc tells panisse how to handle incoming mail. For information on how to build sendmail.cf from these files, consult the sendmail web page. The right directory to be working in is /usr/share/sendmail-cf/cf

Procedure for Adding/Editing a System File

...was covered in the Fcm section.

Procedure for Changing an IP

Sometimes, for whatever reason, you need to change the IP of an already existing computer. Here I will assume that you aren't changing the name, just the IP.

Get the IP changed by talking to the lab folks: ip-request@lbl.gov
Reboot the computer in single user mode by adding single to the boot line in grub.
Edit /etc/sysconfig/networking and /etc/sysconfig/network-scripts/ifcfg-eth0 (assuming that the ethernet device is eth0) and /etc/hosts to reflect the new address.
Reboot the machine
Remove the old IP address and add the new one in both places in the hosts.allow file in fcm /home/steege/fcm/custom/pc/install/etc/hosts.allow
Do the same under nerscify if necessary
Update the ssh key to refer to the new address in /home/steege/fcm/custom/default/install/etc/ssh/ssh_known_hosts*. Again, do the same on nerscify
Edit pg_hba.conf for lilys: /home/steege/fcm/custom/lilys/install/home/db/postgres/data/pg_hba.conf
fcm and nerscify all of the machines using the appropriate scripts
Go around to all of the servers and do exportfs -r
Restart the postgres server on lilys: /etc/rc.d/init.d/postgresql restart

Procedure for Adding a Disk

If you are going to add a disk to one of the PCs, you must take the following steps. Most of them (everything but playing with hardware) must be done as root. These instructions are specific to installing SCSI disks. Furthermore, these instructions apply to ext2/ext3 filesystems.

Install the hardware. I won't say much more about this, but it often isn't as easy as all that. You should worry about SCSI termination, proper SCSI3-SCSI2 conversions, SCSI cable lengths, SCSI addresses, and numerous other things.
If you add a disk with SCSI ID lower than any disk that's already on the system, lots of things are going to break. Adapt your fcm configuration files to reflect the new SCSI bus. I will say no more, because if you get this far you'd better know what you're doing (or the damage you've done is only beginning).
Format the disk. Start with "fdisk" to partition the disk. Be very careful, or you will delete vital Deepsearch data! (I.e., make sure you're running it on the right SCSI device.) Then run "mke2fs" to make a filesystem. You should specify lots of extra options to "mke2fs" in order to make the disk maximally efficient. (Hints: we often have big files, so we don't need too many inodes; also, the system administrator does not need 5% of a 18GB disk set aside for his use.)

Example:

mke2fs -j -T largefile -m 1 /dev/something
Choose a name for the disk. This is usually /home/astro*, where the * is a number which hasn't already been used by any of the other /home/astro* disks on any of the Deepsearch Suns or the Deepsearch PCs. All further examples below will assume you've chosen the name /home/astro666.
Add the disk to the machine's fstab. Edit

/home/steege/fcm/custom/machine_name/install/etc/fstab
This requires you to know what you're doing.
Create the mount point for the disk. If the disk is to be /home/astro666, you would do this with the commands:

% mkdir /home/astro666 % mkdir /home/steege/fcm/custom/machine_name/install/home/astro666
Export the disk. Edit:

/home/steege/fcm/custom/machine_name/install/etc/exports
This requires you to know what you're doing.
Put the disk in the automounter for all the other PCs. Edit:

/home/steege/fcm/custom/pc/install/etc/auto.home
Make the automounter link for all of the other machines. Again, if your disk is /home/astro666, you would do this with the commands:

% cd /home/steege/fcm/custom/pc/install/home % ln -s /autofs/astro666 astro666
fcm the machine upon which you are mounting the disk.
Mount the disk (for now) by hand:

% mount /home/astro666
Ideally, you won't ever have to do this again, because the disk will be mounted automatically the next time the system starts.
Do "df", look at the disk, cd to it, generally make sure things are OK.
Manually export the disk:

% /usr/sbin/exportfs -r
This will never be necessary again, for henceforth it will automatically be done when the system starts.
Fcm the rest of the PCs (using fcm-ws on as described above).
If the new disk is a deepsearch data disk, it needs to be added to the DEEPIMAGEPATH. This is set by editing /home/lilys/idl/idl_setup.csh and /home/lilys/idl/idl_setup.sh and/or one of the other versions having to do with the nearby or snfactory setups.
Add the disk to the Sun automounter in auto_direct.

Handling RAID

A RAID array is a set of disks which have been combined together into one larger virtual disk. Typically this includes some sort of redundancy, so that you can have a disk die without actually losing any data. Usually, this means that the size of the larger virtual disk is less than the sum of the sizes of the disks that comprise the array. For example, on rustica, /dev/md0 is a RAID-5 array of four 50GB disks. The size of /dev/md0 is about 150GB (3x50GB).

Mostly we use software RAID, but currently there are two computers with some form of hardware IDE raid: zacharys and drago. Zacharys uses a RAIDZone product, which unfortunately has not been very well supported. Information about the status of the RAIDZone can be found by logging onto zacharys and /usr/sbin/raidzone. The hardware RAID on drago is based on a 3Ware Escalade 8506 SATA card, which should have better support. Information about the status of this raid can either be obtained by rebooting drago or by logging onto drago and using a web browser to access http://localhost:1080/ This web page is only available from drago.

Software RAID is handled by the kernel, and most of the time doesn't require much thought. Any SCSI disks we have in RAID arrays are managed by the Linux kernal software RAID system. You can treat the /dev/md devices just like any other device. However, you must take some care when handling disks that are part of a RAID aray. Some points of order:

Read the Software RAID HOWTO! Get the current version linked here, not the version that's on the Linux Documentation Project.
Never run fsck on an individual disk which is part of a RAID array. Only run fsck on the /dev/md device. For example, it would be a spectacularly bad idea to run fsck /dev/sdc1 on rustica.
Keep the disks that are part of a RAID array together. Obviously, bad things could happen if you move pieces of a RAID array to a different computer.
Keep /etc/raidtab (in the fcm area) for each computer updated with current information about all Software RAID arrays running on that system.

The status of a software raid array can be found by examining the /proc/mdstat file on the machine it is mounted on:

% cat /proc/mdstat

Hopefully you will see all U-s. If you don't, one (or, God forbid, more) of the disks have failed. If only one disk has failed, the array should still be functional (we only use RAID-5 here), but it is imperative that you replace the non-functional disk first. Refer to the Software RAID HOWTO, but here are roughly the steps you should follow if only one disk has failed. If more than one has failed, you are in considerable trouble. Consult the HOWTO for minimal advice.

See if you can get the failed disk to work again by checking the power connecter, interface cable, etc.
If this fails, find another disk of the same size OR LARGER as the original disk, and with the same type of interface.
Connect the replacement or fixed disk to the computer hosting the RAID. Make sure that you do not disturb the device ordering of the currently existing disks. Even if you have gotten the old device to work you will need to move it to the end of the SCSI chain. This is because the persistent superblock now expects things to be in a certain order, and if you disturb it then the RAID will stop working even in degraded mode. Note that with the persistent superblock set the raidtab file is completely ignored, so you won't be able to trick the computer into accepting a new ordering.
Once the computer is up and running with the RAID in degraded mode and the new disk hanging off of the end of the SCSI chain, use raidhotadd to add in the replacement disk. This will rebuild the raid array, which may take several hours on a big device.
Once it is rebuilt, check /proc/mdstat to make sure it is happy. Notice your devices are now most likely in a different order.
Edit the raidtab to reflect the new ordering, which you can check by looking at /proc/mdstat again
If you had to completely replace the disk, check to see if it is under warranty

Ideally, my SCP Disk Usage Page should list which PCs are using RAID arrays, and which disks on those PCs are going into the RAID arrays.

Procedure for Adding a User

Don't believe anything you read in the Red Hat manual or elsewhere about how easy it is to add a user using some "adduser" script or some GUI utility. It will generally be harder to massage those things into something that fits with the Deepsearch PC setup than it will be to just do it the way it used to be done when men were real men, women were real women, and sysadmins were real geeks.

Figure out the UID for the user. If the user is going to also have a Sun account, everybody's life will be MUCH happier if you give her the same UID on the PCs as she will have on the Suns. If she doesn't have her Sun account yet, you can give her a temporary UID, but once she gets her Sun account you're going to have to change the UID and the UID on all the files they've created. So just pick a high number (as of this writing, 5 March 2001, we're in the 20,000s) and make it the same on both systems. You can often refer to the previous account created (generally the last line in the 'passwd' file) for a good place to start.

Do the following steps as root on zacharys:
Mount /home/steege on zacharys (/home/steege lives on Rustica and by default is not mounted on the other files servers to avoid NFS hangs if a machine goes down. So, please, after you're through, "umount /home/steege" as it says below.)

zacharys% mount /home/steege
Add the user to the password files. Edit:

/home/steege/nerscify/custom/pc/install/etc/passwd
Give her GID 4028 (deepsrch). Make her shell /bin/tcsh unless she whines and wants something different. Make her home directory /home/lilys/username, unless there is a /home/machine_name directory for the workstation on her desk, in which case you might want to put her home diretory there. (Use your brain.) Put "x" in the password field of the passwd file.
Edit:
/home/steege/nerscify/custom/server/install/etc/passwd
and do the same as above except make the shell /bin/false to disable logins.
Add the user to the shadow password file. Edit:

/home/steege/nerscify/custom/pc/install/etc/shadow
/home/steege/nerscify/custom/server/install/etc/shadow
Put "*" in the password field.
Add the user to the phyd group in the group file:

/home/steege/nerscify/custom/pc/install/etc/group
Nerscify zacharys (/home/lilys is currently on zacharys)

zacharys% cd /home/steege/nerscify
zacharys% ./nerscify2 -h zacharys -q -l
Umount /home/steege from zacharys. You must first change to a directory not on /home/steege

zacharys% cd
zacharys% umount /home/steege
Create the user's home directory in /home/lilys. Run chown and chgrp so that she owns her directory and set the directory permissions:

zacharys% cd /home/lilys
zacharys% mkdir (username)
zacharys% chown (username):deepsrch (username)
zacharys% chmod 755 (username)

Do the following steps as root on sauls (or any other non server):
fcm sauls so that it is aware of the new user. See the instructions above for zacharys
Use "cd" to change into the ~username directory.
Do "su - username".
Install standard base startup files:

% cp -p /etc/skel/* ./ % cp -p /etc/skel/.* ./
Do "passwd username" as root to set a default password for the user. This has the bonus side effect that it will copy the password file to all of the other PC Workstations. You're still on sauls, right? This step really needs to be done from a workstation machine because they mount '/usr/local', where the appropriate passwd.scp command resides which automatically keeps the password files synchronized. I like to make the initial password really nasty to increase the chance that the user will change it quickly.
fcm all machines.
For the servers:

sauls% cd /home/steege/fcm
sauls% ./fcm-server -q

And now for the workstations
sauls% cd /home/steege/fcm
sauls% ./fcm-ws -q -l

In case you're wondering, the fcm-server script automatically mounts and unmounts /home/steege/ as necessary.
Do "exit" to get out of the su.
Tell the user to change her password, and nag her until she does.

Procedure for Deleting a User

Edit:

/home/steege/nerscify/custom/pc/install/etc/passwd /home/steege/nerscify/custom/server/install/etc/passwd
If you wish to disable the user's account, set her shell to /bin/false. If you wish to delete the account, comment out the line in the file by adding a "#" at the beginning.
Edit:

/home/steege/nerscify/custom/pc/install/etc/shadow /home/steege/nerscify/custom/server/install/etc/shadow
If you are disabling the user's account, set the password field to "*" (without the double quotes). If you are deleting the user's account, comment out the line in the shadow file by adding a "#" at the beginning.
Delete the user's home directory. Warning: You may not really want to do this! The user may still have files that the Deepsearch needs! In any event, it is generally considered polite to back the directory up to a tape first. (Use a 2GB 90m DDS tape, as those are the cheapest, if we're talking less than 2GB of data.)
Find all other random directories left over on all Deepsearch disks in user and scratch partitions loaded with useless files that nobody else is ever going to care about, and delete them. This is a lot of work. Feel free to procrastinate on it for months. Again, it's polite to back up any directories you delete before actually deleting them.

Procedure for Installing a New 7.3 Workstation

The first step is to install the actual hardware. It is assumed that you know how to do this. If this is a new machine with a name that isn't currently registered to us, you need to let the lab networking folks know you need an IP address. It is no longer necessary to contact distributed printing about a new machine.
Updating the CD image
- cd /home/steege/src/rh-cd-7.3
- Examine mirror.redhat to make sure that it is up to date. Make sure that the mirror sites listed in the file are still valid.
- mirror -pupdates-bnl mirror.redhat
  
  This will get updates from the brookhaven mirror, which is one we seem to have a particularly good connection to. This will start ftping stuff over if there are updates to be had.
- Examine update-rpms to make sure that all of the necessary rpm directories are present. If you have decided to add a new rpm directory, you will need to add it to this script. If you haven't, things are most likely fine and don't worry about this. Rpms are kept in the following directories at the moment:
  
  /home/steege/pkg/redhat-7.3/i386/RedHat/RPMS -- rpms that came with the redhat distribution
  /home/steege/pkg/redhat-7.3/updates/ -- rpm updates
  /home/steege/pkg/redhat-7.3/other-rpms/ -- rpms added by us
  
  These are looked at sequentially, in the order determined by update-rpms, so don't expect repeats to be handled correctly.
- Run update-rpms
Update the kickstart image
- The kickstart config file is /home/steege/src/rh-cd-7.3/ks.cfg . This needs to be modified for each machine you want to install on. For example, only panisse needs to have the ftp server installed, and only the servers need the ups package.
- Note the root password set in the kickstart file:
  grep rootpw /home/steege/src/rh-cd-7.3/ks.cfg
- Find a floppy and write the bootnet.img file to it. This file will be in /home/steege/pkg/redhat-7.3/i386/images
  If updates have been issued by redhat, you'll need to look in /home/steege/pkg/redhat-7.3/updates/images. Do this on the computer you are working on -- not on rustica.
  
  cd /home/steege/pkg/redhat-7.3/i386/images dd if=bootnet.img of=/dev/fd0 mount /mnt/floppy cd /home/steege/src/rh-cd-7.3 cp -p ks.cfg /mnt/floppy umount /mnt/floppy If you feel anal you can check /etc/fstab to make sure the floppy is really called /dev/fd0 and /mnt/fd0.
Use nerscify and fcm to add the address of the new machine to the hosts.allow files. Try /home/steege/nerscify/custom/pc/install/etc/ and /home/steege/fcm/custom/pc/install/etc/ Also make sure to edit the custom files for the severs. Add the IP address (if new) to the file pg_hba.conf on Lilys using fcm. Then add the new machine to the .shosts file: /home/steege/nerscify/custom/pc/install/root/.shosts and /home/steege/fcm/custom/pc/install/root/.shosts. Add the name of the new machine to the export lists in nerscify.conf and fcm.conf. Once these are updated on rustica and panisse, ssh to rustica and re-export the filesystem:

[root@rustica /root]/usr/sbin/exportfs -r

Put the floppy in the new computer and boot it. With luck, everything will work. If not, good luck. I recommend the following partitions:

/ 1024MB

/usr 4096MB

swap 2048MB

/scratch Remaining space

Modify the fcm-ws script to include the new computer, so that you can use nerscify on it
Rebooting and customizing
- Note, the root password has already been set in the kickstart file (see above).
- mkdir /home/steege mount -tnfs rustica:/home/steege /home/steege
- Edit lots of stuff in /home/steege/fcm/fcm.conf. Be a good sysadmin and update the comments at the beginning of both this file and nerscify.conf to reflect the new arrangement.
- Make a custom directory for the new computer in /home/steege/fcm/custom. Copy over the files from a similar computer and edit them to match, if this is possible. Copy the fstab file that the installation created into the custom directory, adding an @APPEND@ line at the beginning so that nerscify knows to add in the shared disks.
- A ssh hosts key should have been generated during the install process. Edit the known hosts files in /home/steege/fcm/custom/default/install/etc/ssh /home/steege/nerscify/custom/default/install/etc/sshto add the new computer. The keys can be found in /etc/ssh.
- Apply your modifications with fcm:
  
  cd /home/steege/fcm ./fcm -h < machine > -q -b
  
  Currently this needs to be done from X-windows for the pyraf installation in the byhand scripts to work correctly.
Configure Xwindows

Edit /etc/X11/XF86Conf-4 file and add additional display modes as desired. Then copy the file to the fcm directory

cp -p /etc/X11/XF86Conf-4 /home/steege/fcm/custom/< machine >/etc/X11/
Adjust RPMS

Sadly, at the moment the installation procedure doesn't quite work correctly under RedHat 7.1. Some of the RPMs don't get installed correctly. There are a few things that need to be done by hand. Details can be found here. If you aren't installing a 7.1 system, you can ignore this.
Deal with astro disks

If the new machine exports any /home/astro disks, youwill need to follow the procedures for adding a new disk if you want the other computers to be able to automount it.
Add to passwd.scp

Add the name of the new machine (if it is a new machine, and not a replacement for an old one with the same name) to /usr/local/bin/passwd.scp. This must be edited on rustica or lilys. Note that this file does not seem to be currently handled by nerscify.
Mail

If this machine is not a replacement for an old machine (i.e., it has a brand new, not previously used name), ask the lbl networking people to add a MX record to the DNS tables for the new machine forwarding all mail to panisse. Then, edit the local-host-names file as described above under mail to get panisse to accept mail addressed to the new machine. In theory, the byhand part of the nerscify script will install the appropriate sendmail.cf, but you should check.
Web stuff

If this is a new machine, the computer should be added to the .htaccess lists on panisse (/home/panisse/www/htdocs/groupwork/collab/.htaccess) and lilys (/autofs/astro47/httpd/html/deepdb/.htaccess). Neither of these files is handled by fcm/nerscify.

Procedure for Installing a New RHEL 3 Workstation

Again, start by installing the relevant hardware
Make sure you have a licence with activation key. If a new license is needed, contact licenses.lbl.gov.
The kickstart file
- Edit the kickstart file in /home/steege/src/rh-cd-es3 to reflect the information for the machine -- subnet, gateway, etc., plus any extra packages that need to be installed.
- Make the kickstart file available on one of the public web pages temporarily.
- Note the root password set in the kickstart file:
  grep rootpw /home/steege/src/rh-cd-es3/ks.cfg
Boot off of the boot CD. At the prompt type
linux ks=(URL of ks file)

I recommend the following partitions (assuming that no /home/astro type disk will be created on this computer):

/ 1024MB

swap 2048MB

/usr 5120MB

/var 3096MB

/scratch Remaining space

Register the licence using the instructions that the lab provides. I would recommend against disabling automatic kernel updates.
Update using up2date -u.
Install all of the RPMS in /home/steege/pkg/redhat-es3/i386. This may require some --force action.
Set up fcm for the new computer and then run it.

Procedure for Restoring Lilys

To be written.

Procedure for Restoring a Workstation

To be written.

Boot Floppies

If you need to make a new boot floppy, do the following:

Insert a floppy.

cd /home/steege/rescue dd if=emer_boot of=/dev/fd0 /usr/sbin/rdev -r /dev/fd0 49152 /usr/sbin/rdev /dev/fd0 /dev/fd0
Insert another floppy

dd if=emer_root.gz of=/dev/fd0
Insert another floppy

dd if=emer_util of=/dev/fd0

Maintained and written by Alex Conley (AJConley@lbl.gov).

/	1024MB
swap	2048MB
/usr	5120MB
/var	3096MB
/scratch	Remaining space