Useful Unix commands
One of the wonderful things about Unix is that there is a command to do just about anything you could ever conceive of doing. The tricky part is figuring out what the command is called and what switches it takes to do what you want. This page provides a primer on using some of the most important and useful commands to know.
For more information you can use the man command on Unix (e.g. man wc for help on the wc command), although sometimes man pages can be difficult to understand. Another good resource is to search for the command on Wikipedia.
Before you do much of anything else you need to know where you are and how to get somewhere else. Here are some commands to help you get around.
The ls command lists the contents of the current directory. Useful flags include -l to output the listing in long format, which gives more info than just filenames, -h, which in concert with -l shows file sizes in a more human readable format (i.e. bytes, kilobytes, megabytes, etc. instead of disc blocks or byte count), and -a, which shows hidden files. For all of these commands, flags can generally be combined. e.g. ls -lha shows the directory listing in long format with human readable file sizes and includes hidden files.
1.2. cd and chdir
cd is short for change directory. (In some shells it is also called chdir.) The directory can be specified as relative to the current directory (e.g. to go to the foo subdirectory of the current directory you'd type cd foo), relative to the root directory of the system (e.g. to go to the bin subdirectory of the usr directory you'd type cd /usr/bin), or relative to your home directory (e.g. to go to the foo subdirectory of your home directory from anywhere on the filesystem you'd type cd ~/foo), relative to another user's home directory (e.g. to go to the foo subdirectory of a user named bob's home directory you'd type cd ~bob/foo), or relative the special names . (this directory) and .. (parent directory). You can use the parent directory name an arbitrary number of times (e.g. to go down three directory levels, type cd ../../../).
There are also a couple of nice shortcuts. If you type cd with no arguments it always takes you back to your home directory. If you type cd - it will take you to the last directory you were in before the current one. This can be useful if you were working in a directory like /Library/Frameworks/Python.framework/Versions/Current/lib/python2.4/, went back to your home directory to do something, and want to get back without typing all that again.
The pushd command puts the current directory on the top of the directory stack and changes to the directory given as an argument.
The popd command takes the top directory off the directory stack and changes to the directory.
Probably the most simple command that will be discussed here. All pwd does is print the working directory, i.e. show you where you are in the file system.
2. File manipulation
Glob is not a command, but it's an important concept for file manipulation. A glob is a way of specifying multiple files that have something in common in their names. There are two globbing characters; the "*", which matches any number of characters and the "?", which matches a single character. e.g. You would type ls *.txt to see a listing of all files in the current directory that have an extension of ".txt".
The mv command moves a directory or file. As arguments it takes the source and the destination. It can also be used to rename a directory or file (e.g. mv foo bar renames the file foo to bar).
The cp command copies a directory or file(s). As arguments it takes the source and the destination. It can take a list of files and will interpret the last argument as the destination and the rest as the source.
The rm command removes (aka deletes) a file. This can be a highly dangerous command, because unlike moving things to the Trash Can on a Mac or the Recycle Bin on Windows, the file is gone as soon you hit enter. Combined with globbing it becomes even more dangerous. Always double check (or triple check) your command line when rming files.
The mkdir command makes a directory of the specified name.
The rmdir command removes a directory. The directory must be empty to be removed, which can be inconvenient. A much more common way to delete a directory is to use the command rm -rf, which removes files (and directories) recursively through the directory structure and forces the delete if need be. This is possibly the most dangerous Unix command. A misplaced space on the command line and you could delete every file and directory in the current directory. e.g. If you typed rm -rf foo (note the two spaces before foo) or rm -rf * .txt (note the space between "*" and ".txt"), which would delete all files and then the file called ".txt". If you were logged in as a superuser and accidentally typed rm -rf / foo (again, note the extraneous space) when meaning to delete /foo you would delete all files on the system!
The touch command updates the timestamp on a file to the time that you run the command. If the file doesn't exist, it is created. Usually the command is used to create an empty file of the specified name.
The tar command stands for tape archive. It was originally written to serialize a number of files together into one file for the purpose of putting them on a tape back up that was easy to restore from. These days it's usually used to archive files together similar to a Zip or Rar file. To create a tar file, type tar cvf filename.tar filenames, to unarchive a tar file, type tar xvf filename.tar. Most of the time you'll want to compress the file. In that case use the z option for gzip or the j option for bzip2. (e.g. To create a bzip2'd tar of the directory foo, type tar cvjf foo.tbz foo/.) By convention, the extension for gzipped tar files is .tar.gz or .tgz and the extension for bzip2'd tar files is tar.bz2 or tbz.
3. Ownership and Permissions
Access to Unix files has traditionally been controlled by permissions based on three levels called user, group, and other. The output of ls -l gives you lines like the following:
-rwxr-xr-x 1 root wheel 2064128 Oct 17 2006 vim
The -rwxr-xr-x part is the list of permissions for the file. There are ten slots in the permissions. The first slot tells what kind of file it is. - indicates a normal file, d a directory, l a symbolic link to another file. The other nine slots are three groups of three permissions, one each for user, group, and other. Within each group slot 1 is read, slot 2 is write, and slot 3 is execute. If there is a - in a slot it indicates that the permission is not set. So, in the case of the file in the example, it's a normal file that the owner (aka user) can read, write, and execute, the group and everyone else (aka other) can read and execute.
The root wheel part gives the owner and group of the file. Our example file is owned by root, who is a special user called the superuser. The file belongs to the wheel group, which is a special systems administrator group on BSD Unixes.
The rest of the information after the ownership part tells you the file size (2064128, in bytes), the last time the file was modified (Oct 17 2006) and the filename (vim).
To change the owner of a file, use the chown command. As arguments it takes the username to give the file to and the file(s) to change. Optionally you can change the group at the same time by putting a colon followed by the group name after the username (e.g. chown user:group filename). When used on a directory the -R flag performs the change recursively for all files through all subdirectories of the specified directory.
If you just want to change the group a file belongs to, use the chgrp command. It works similarly to chown.
The chmod command changes the mode (aka permissions) on a file you own. You can specify the permissions two different ways. The more straightforward way is to use the letters u, g, o to specify user, group, other, and r, w, x to specify read, write, execute, and + and - to give and take away permissions. (e.g. to give the group read and execute permissions do chmod go+rx filename.) The more compact way involves numeric permissions. read has a value of 4, write 2, and execute 1. They are then added for a value of 0 to 7 and specified as a group of three for the user, group and other. (e.g. to set the mode rwxr-x--- to a file do chmod 750 filename.)
The su command lets you temporarily change your identity. This is useful for running commands or read files that you don't normally have permission to. You must provide the password of the user you want to su to. Given that you should never under any circumstances give your password to anyone else, this command is of little use to most users. It is primarily used so change to being the root user by administrative users.
The sudo command lets you run a command as another user. The most common use is to run a command as root, which is the default. There is a file called /etc/sudoers that controls who can use sudo to run what commands. (The sudoers file can only be read or edited by administrative users.) Normal users are very restricted in what they can do. sudo -l shows a list of commands you can sudo.
4. Text munging
An overarching theme in the design of Unix is that everything is a stream of characters. Consequently there are a number of commands that allow you to manipulate character streams.
4.1. | ("pipe")
Most Unix commands are short, but pipe is just a single character, |, which is usually located on the same key as the backslash. The pipe command allows you to send the output of one command to the input of another. You can string a number of commands this way to make a pipeline where the only the final result is printed.
4.2. more and less
The more and less commands output the contents of a file one screenful at a time. more is the older command and can only go forward through a file, i.e. once text has scrolled by you can't get back to it. less is a more sophisticated replacement that allows you to more forward and backward through the file and accepts vi-like commands. Hit h while less is running to see all the commands it accepts. On some systems (e.g. MacOS X) more has been aliased to less.
The cat command is short for concatenate, although in practice that isn't too helpful to know. What it's useful for is outputting the contents of file to the output, which is generally then piped to another command.
If you have a file of compressed text you don't even have to decompress it to cat it, as most systems now have zcat, gzcat, and bzcat, which can cat files compressed with zip, gzip, and bzip2 respectively.
4.4. head and tail
The head command shows you the first n lines of a file, where the default is 10. You can specify n with the -n switch. The tail command does the same, but for the last n lines instead.
The grep command searches globally for a regular expression and prints matches. The two most common uses for it are to search for files that contain a specific string or match a regular expression, or to do the same for a stream piped into it.
Like cat, grep now has variants that handle compressed files. They are called zipgrep, zgrep, bzgrep for zip, gzip, and bzip2 respectively.
The diff command shows you the differences between two files. Useful flags include -b (ignore changes in white space), -w (ignore all white space), and -B (ignore changes that are blank lines).
As you would expect, sort takes its input and outputs it in sorted order. Use sort -r to sort in reverse, and sort -f to fold upper and lower case together (meaning that 'A' and 'a' are treated the same; the default is for A-Z to come before a-z).
The uniq command filters its input such that if multiple consecutive lines are the same it only prints one instance to the output. To be useful the input needs to be sorted already, and as such it is usually used in a pipeline after sort.
wc is short for word count. The default output is the number of lines, words, and bytes in a file. You can use the -l (lines), -w (words), -c (bytes) flags to limit what is shown. You can use the -m flag for number of characters, which is only useful for textfiles with multibyte characters, e.g. Chinese or Japanese.
The cut command is used to cut chunks out of lines in a file. The most common flags to use are -f, which is by field (tab being the default field separator) and -c, which is by character or character range.
The paste command takes an arbitrary number of filenames as arguments and outputs sequentially correspondent lines concatenated together, separated by tabs. In essence, the inverse of cut -f.
sed is short for stream editor. It's actually a full-fledged programming language, but it's often used to write one line scripts to manipulate a character stream. Highly useful to know how to use, but too complex to discuss in detail here.
Like sed above, awk (short for Aho, Weinberger, and Kernighan, last names of its creators) is a full programming language that can also be used inline for character stream manipulation. Again, you should learn it, but it's too complex for this page.
5. Job control
Like Windows and MacOS X, Unix is a multi-user preemptive multitasking operating system. This means that multiple people can be using the system at the same time and that multiple programs can be running concurrently . To allow multiple programs for the same user at the command line, a number of programs exist to manage jobs.
Short for process status, ps shows a list of processes and some information about them. With no arguments it shows basic information about processes belonging to the user running the command. It takes a number of different flags about what to show. Some common and useful flags include a (show all users' processes), u (adds a host of other information about the processes), and w (wide, 132 columns instead of 80), x (show processes that aren't children of a shell). These can be combined to make commands like ps auwx, which gives rather a lot of information about every running process.
The top command gives information about system resource usage sorted by top users of a particular resource, task CPU time by default. It is good for helping to identify resource hogging programs on the system so they can be dealt with.
There are several versions of top floating around, so check the man page on the system you are on to see what flags and runtime options it accepts. You can always get a brief list of options by hitting ? or h while it is running.
For the purposes of job control, all processes are given a job number. To see a list of all of your jobs use the jobs command.
The ampersand is not a command; it's a job control flag for the shell. If you put a & at the end of your command it will run in the background, freeing you to run other commands without opening a new terminal.
If you are running one program and want to run another without quitting the first, use ^Z (that is, hold down the Control key and hit z) to stop (but not quit) the job and return to the command line.
A job that has been stopped is not getting any CPU time. This may be fine for something like a text editor, but sometimes you have something like a script that has to process a lot of data that can safely be run in the background with no user interaction. In this case use the bg command. With no arguments it backgrounds the most recently stopped job, it also can take a job number to background.
If you want to bring a job to the foreground, use the fg command. With no arguments it foregrounds the most recently stopped or backgrounded job, it can also take a job number to foreground.
If you want to adjust how many resources your process gets, use the nice command followed by your command's name. "Niceness" ranges from -20 to 19, with the lower numbers getting more priority. With no argument nice gives your process a priority of 10, the default starting priority is 0. Only a superuser can give a process a negative priority.
To adjust the priority of a running process, use the renice command. See the man page for more detail.
Sometimes you have to kill a hung process. The kill command does this for you. It takes a variety of flags for what kind of kill signal to send to the process. The default is 15 (aka TERM), which basically asks the program to quit and optionally run any cleanup code it has, if any. A program can ignore this signal and a truly hung program can miss it. If plain kill fails you have to use signal 9 (aka KILL), which causes the OS to kill it unconditionally. This can leave things in a messy state if the process had any files open, so use kill -9 sparingly. Some special programs that run constantly in the background (called daemons) also accept signal 1 (aka HUP or hangup), which causes them to reset. Most programs quit on receiving a HUP signal.
Several Control key sequences also send signals. For example, ^Z (discussed above) sends the TSTP (stop) signal to the current foreground process and ^C sends the INT (interrupt) signal, which tells the process to quit.
When a process is started as an argument to the nohup command, it ignores the HUP (hangup) signal sent to all processes when a user disconnects from the system. Normally HUP is sent to all processes belonging to a user on a specific terminal at logout or disconnect to prevent zombie (i.e. parentless) processes from running amok. nohup is useful if you have a large data processing job that needs no user interaction and you want to logout while its running or could be automatically logged out if you're idle too long, which does hold for the CS systems here.