Cluster computing: when your dinky laptop just isn't enough

We sometimes run into situations where it's necessary to, say, fit a whole series of mixed-effects regression models to a very large dataset, or do lots of sampling to estimate a Bayesian model.

Our personal computers are generally good enough for simple tasks, but for these computationally intensive tasks they're not always big or fast enough. For this, we have cluster, many smaller computers which have been taped together. There are two clusters that people in the lab generally use: the Computer Science cluster and the University's Center for Integrated Research Computing cluster (Bluehive) and supercomputer (BlueGene, which you probably won't be allowed to use).

We'll assume that you already have an account, or know how to get one, and will walk through logging into the cluster, running things interactively, submitting batch jobs, and writing (at least minimally) parallel code.

Logging in

You'll use ssh. On a Max, this is as easy as opening up a Terminal, and typing in ssh username@bluehive.crc.rochester.edu (to log into Bluehive), and then typing in your username and password when prompted.

This process can be made much more secure (and faster, if you like to live dangerously or edit your ~/.profile a lot) by using ssh keys. ssh keys replace the normal password exchange that takes place when you log in to a server using ssh with public key cryptography security. You will have two keys: a public key that everyone can see, and a private key that no one can ever be allowed to see. You will distribute your public key far and wide, putting a copy of it on every server you want to log into, while your private key will sit securely (and if you're really paranoid, encrypted-ly) on your computer. The public key is a very large number, which is used to encode a small message by the server which is very very difficult to decode without the private key (unless you have a quantum computer).

If you haven't already done so, you need to create your public-private key pair:

cd ~/.ssh
ssh-keygen

This will prompt you to enter a passphrase. If you do enter a passphrase, you will have to either type this passphrase in every time you log in to a server using these keys, or will have to futz with ssh agents. If you don't enter a passphrase, then you will never have to enter your password again when using these keys. The trade off is that anyone who can read your private key file (aka someone logged into your computer as you) can then log into every server which uses that key pair and cause all kinds of mischief.

The next step is to distribute your public key, which is probably in a file called ~/.ssh/id_rsa.pub, to the server.

MoinMoin Appliance - Powered by TurnKey Linux