Cluster computing: when your dinky laptop just isn't enough

We sometimes run into situations where it's necessary to, say, fit a whole series of mixed-effects regression models to a very large dataset, or do lots of sampling to estimate a Bayesian model.

Our personal computers are generally good enough for simple tasks, but for these computationally intensive tasks they're not always big or fast enough. For this, we have cluster, many smaller computers which have been taped together. There are two clusters that people in the lab generally use: the Computer Science cluster and the University's Center for Integrated Research Computing cluster (Bluehive) and supercomputer (BlueGene, which you probably won't be allowed to use).

We'll assume that you already have an account, or know how to get one, and will walk through logging into the cluster, running things interactively, submitting batch jobs, and writing (at least minimally) parallel code.

Logging in

You'll use ssh. On a Mac, this is as easy as opening up a Terminal, and typing in ssh username@bluehive.crc.rochester.edu (to log into Bluehive), and then typing in your username and password when prompted. If you're off-campus, you'll need to first connect to the University's VPN.

This process can be made much more secure (and faster, if you like to live dangerously or get a kick out of editing your ~/.profile) by using ssh keys. ssh keys replace the normal password exchange that takes place when you log in to a server using ssh with public key cryptography security. You will have two keys: a public key that everyone can see, and a private key that no one can ever be allowed to see. You will distribute your public key far and wide, putting a copy of it on every server you want to log into, while your private key will sit securely (and if you're really paranoid, encrypted-ly) on your computer. The public key is a very large number, which is used to encode a small message by the server which is very very difficult to decode without the private key (unless you have a quantum computer).

If you haven't already done so, you need to create your public-private key pair:

cd ~/.ssh
ssh-keygen

This will prompt you to enter a passphrase. If you do enter a passphrase, you will have to either type this passphrase in every time you log in to a server using these keys, or will have to futz with ssh agents. If you don't enter a passphrase, then you will never have to enter your password again when using these keys. The trade off is that anyone who can read your private key file (aka someone logged into your computer as you) can then log into every server which uses that key pair and cause all kinds of mischief.

The next step is to distribute your public key, which is probably in a file called ~/.ssh/id_rsa.pub, to the server, and put it in a file called authorized_keys in the .ssh directory:

scp ~/.ssh/id_rsa.pub username@bluehive.crc.rochester.edu:pubkey.txt
ssh user@remote.host
mkdir ~/.ssh
chmod ~/.ssh
cat pubkey.txt >> ~/.ssh/authorized_keys
rm ~/pubkey.txt
chmod 600 ~/.ssh/*
exit

Now you can log in using the keys instead of your password: ssh username@bluehive.crc.rochester.edu. If you used a passphrase to create the private key, you'll have to enter it again here.

Another, optional step is to set up even more shortcuts. You can specify nicknames for commonly used ssh servers, and set default usernames (since you still have to type in your username every time if it's not the same as the one on your local machine, which it probably isn't for us). This information goes in the file ~/.ssh/config. Here are the contents of mine:

Host bluehive
  HostName bluehive.crc.rochester.edu
  User dkleins2

Host slate
  HostName slate.hlp.rochester.edu

Because my username on my laptop is dkleinschmidt, this saves me from having to type in my netID every time I log into bluehive, and it saves me from having to type that .crc.rochester.edu garbage. Instead, I type ssh bluehive, and I'm in!

Running interactive jobs

Bluehive

On Bluehive, running R interactively is really easy. ssh into Bluehive and run the command qR:

[dkleins2@bluehive ~]$ qR

qR: Starting R with 1 processors and 2 GB of RAM..

If you want more than one node (up to 8), type, for instance:

qR 8

to reserve n nodes.

You can now interact with R in the same way would would running it in the terminal on your laptop. You can also install your own site packages exactly as you do on your own computer, using the install.packages() command. There was at one point some funny business about lme4 not installing correctly because of a mismatched version of the Matrix library, but hopefully that has been resolved (DAN??).

SSH display tunneling

It's even possible to interact with R (or whatever other software you please) graphically using X-windows tunneling. Make sure X11 is running (if you're on a Mac), and when you connect to the server, add the -Y flag to the ssh command:

ssh -Y username@bluehive.crc.rochester.edu

Not quite as pretty as the Mac-native Quartz graphics, and probably a little sluggish, but it gets the job done.

Batch jobs

Batch jobs are a little more complicated. Bluehive and the CS cluster both use the TORQUE queueing system. Many users want to use the clusters at the same time, and TORQUE provides a system for more-or-less fairly dividing the computing resources among the different users. In order to reserve time on the cluster, you submit a small script with some information about how much computing power you need, how long you need it for, etc., as well as the shell commands that actually run your job. There is a good tutorial on the CRC website about how to submit jobs to Bluehive, but we'll go over an example here which should tell you everything you need to know to run R scripts.

All of the files for this section are in this tarball: demo.tar.gz

Example PBS file

#PBS -q standard
#PBS -l nodes=1:ppn=1
#PBS -l walltime=1:00:00
#PBS -o samp-examp.log
#PBS -N samp-examp
#PBS -j oe
#
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
. /usr/local/modules/init/bash
module load R

R CMD BATCH ~/cluster_demo/sampler-example.R 

exit

The PBS file looks generally

Depending on how much use the cluster is getting, your job may run immediately (most of the time this is the case for small jobs on Bluehive).

MoinMoin Appliance - Powered by TurnKey Linux