Introduction to UNIX (and Linux)
UNIX is a generic term for a whole series of similar but distinct
computer operating systems (OSes) that orginated from Ken Thompson's
and Dennis Ritchie's early work on OSes at Bell Labs in the late
60's. (Click here
for an interesting and much more complete history of UNIX). Some of
the features of UNIX that have made it popular are: multitasking - the
ability to do many different things at once; multiuser - more than one
person can use the computer at the same time, or at different times;
portability - UNIX and linux run on almost every computer
architechture ever invented; and for every UNIX flavor there is a
large built-in suite of powerful, free programs. There have been many
UNIXes designed over the years for a whole host of (expensive) compter
hardware architechtures that you have probably never heard
of. Reletively recently, however, operating systems based more or less
on UNIX have begun to invade ordinary people's lives in the form of
linux and Apple Computer's new Mac OS X. (Since, for the purposes of
this class, UNIX and linux are indistinguishable, I will use them
interchangably. Many people do.)
Linux was started as a hobby of a guy called Linus Torvalds in 1991 as
a way to use a free UNIX-like OS on computers that regular people
used, like the PCs that your Mom would use. It took off like a rocket
and today is wildly popular. But linux itself isn't a complete OS. The
reason that linux was successful, and the reason that we are able to
use it for this course, was becasue of Richard Stallman's GNU (which stands for GNU's
Not Unix) project. The GNU project is a *free* UNIX-like OS and
software suite that is maintained by a large number of programmers
located all over the world.
Like UNIX, there are many different flavors of linux that run on a
whole host of computer architechtures. Unlike UNIX, most linux
distributions are completely free to download. Some, like the Red
Hat Linux that installed on our lab machines, cost about $100 if you
want to buy CDs with the software instead of waiting for the
download. Included in that $100 is the ability to call someone and
complain when things don't work. I am not sure, but I am pretty
confident that Red Hat is the most popular version of linux out
there. So enjoy.
What you can do well with Linux:
- Create and edit files (like Word)
- Search through files (like "find in file" in Windows or on Macs)
- Organize files (like "My Computer" and "My Pictures" in Windows or on Macs)
- Run programs (like double clicking in Windows or on Macs)
- Email
- FTP
- Biological sequence analysis (quickly do large numbers of blast runs, clustalw, perl scripting, etc.)
What you can't do well with UNIX:
- Watch movies
- Play games
- Run Java applets of dancing bears in your Web browser
- and so on ...
So the take home message here is that you can do a lot fast with a
little practice with UNIX. It helps to know a little Perl. We'll get
to that.
The UNIX OS
The organization of the UNIX file system is very simple. It's
basically a whole bunch of directories (usually called folders in the
Windows and Mac world) organized into a tree structure. Here is a toy
UNIX file system that I will use as an example:
You'll notice right away that at the top of the tree is a directory
labeled / (which is called the root directory). It's the very
top of the UNIX file structure, and so it contains the all of the
directories, and everything else, in the file system. The next level
down from the root directory contains many directories that have
important system programs in them and (at least in this example) a
directory that contains all of the users of the computer. For us, the
imporant level of directories is the next one down, the user
directories themselves (sudhir and john in the
example). I only mention the non-user directories for
completeness. You do not need, and it would be wise to stay out of,
any of the directories above your user directory. It's not likely, but
it's possible that you could accidentally mess things up. Bad. For
everyone.
When you log in, you will be in the equivalent directory of
john. (john is my login name, so my home directory is called
john.) This is where the action will happen. In this
directory I can make new directories to organize my stuff, move
around, make new programs, run programs, edit files, and so on. Your
home directory is the first "working directory" you will
encounter. The concept of the working directory is simple but
critical. Your current working directory is your location in the file
structure at a given time. Or, in other words, wherever you currently
are is your current working directory.
Getting started
By now, I'm sure your on the edge of you seat saying, "when can we get
started?". I know, I know, learning UNIX has that effect on
people. First, open a new terminal, if one isn't open already (ask if
you don't know how to do this). You will see what's called a prompt
waiting for you to issue a command, e.g.:
[john@ccbtl11 john]$
A "terminal" is the means by which you talk to the computer. The
"prompt" or the "shell prompt" interperates what you type into
language the computer understands and then returns the results back to
you. You can open multiple terminals at once to make it easier to work
in different directories simultaneously. They will all open up in your
home directory.
Making directories and moving around
Issue the ls command to 'list' the contents of your
home directory.
[john@ccbtl11 john]$ ls
[john@ccbtl11 john]$
Nothing should be listed, becasue nothing is there. To make a new
directory, use the command mkdir.
So, for example, to make the directory taing, you would issue
the command
[john@ccbtl11 john]$ mkdir taing
[john@ccbtl11 john]$
Now, do another ls, and the directory taing should be there
[john@ccbtl11 john]$ ls
taing
[john@ccbtl11 john]$
Now I'll make the rest of the directories in the example.
[john@ccbtl11 john]$ mkdir crap rna mp3
[john@ccbtl11 john]$ ls
crap mp3 rna taing
[john@ccbtl11 john]$
You need to be able to move around into your newly-created
directories. This is done with the 'change directory' command,
cd. So, to change into the taing directory,
[john@ccbtl11 john]$ cd taing
[john@ccbtl11 taing]$
To see where you are in the directory tree, use the 'print working
directory' command, pwd
[john@ccbtl11 taing]$ pwd
/home/john/taing
[john@ccbtl11 taing]$
Now say you want to go back up to the john directory. To do this, you need to know how relative directory positions are described in UNIX.
The current directory is always referred to as . (That's
right, just a period.)
The directory above you is always called .. (That's two dots.)
So, in our example, to go back up to the john directory,
[john@ccbtl11 taing]$ cd ..
[john@ccbtl11 john]$ pwd
/home/john
[john@ccbtl11 john]$ ls
crap mp3 rna taing
[john@ccbtl11 john]$
Now, let's make a couple more directories, one in which to store all
of our perl programs that we'll write this semester and one to store
homework.
[john@ccbtl11 john]$ mkdir taing/perl taing/hw
[john@ccbtl11 john]$ ls taing
hw perl
[john@ccbtl11 john]$
Now you can see two directories listed in taing, hw
and perl. Let's move around a bit more, to get more
comfortable with it. I've numbered the line here for ease of
discussion.
[01] [john@ccbtl11 john]$ cd taing/perl
[02] [john@ccbtl11 perl]$ pwd
[03] /home/john/taing/perl
[04] [john@ccbtl11 perl]$ cd ../hw
[05] [john@ccbtl11 hw]$ pwd
[06] /home/john/taing/hw
[07] [john@ccbtl11 hw]$ ls /
[08] bin home local nfs src usr
[09] [john@ccbtl11 hw]$ ls ../../../../
[10] bin home local nfs src usr
[11] [john@ccbtl11 hw]$ ls /home/john/
[12] crap mp3 rna taing
[13] [john@ccbtl11 hw]$ pwd
[14] /home/john/taing/hw
[15] [john@ccbtl11 hw]$ cd
[16] [john@ccbtl11 john]$ pwd
[17] /home/john
[18] [john@ccbtl11 john]$ cd taing/perl/
[19] [john@ccbtl11 perl]$ ls ~
[20] crap mp3 rna taing
[21] [john@ccbtl11 perl]$ cd ~
[22] [john@ccbtl11 john]$ pwd
[23] /home/john
In line, [04], I moved directly from the perl directory into
the hw directory. In lines [07] and [09], I listed the entire
contents of the root directory. In line [11], I listed the contents of
my home directory by giving the ls command what's called the
"complete path" of my home directory. In line [15] I just typed
cd. This always brings you back to your home directory. In
line [19] I listed to contents of my home directory from the
taing/perl directory by using the tilde "~". Tilde always
means home directory. Line [21] does the same thing as line [15].
From these examples, you can see that you are able to string together
any number of directories for moving around or listing the contents of
directories from any other location in the file structure. You just
need to know where you are going relative to where you are, or know
the complete path of where you are going. Play around with it.
Making, editing, removing, and viewing the contents of files
OK, now you can make new directories to organize all your files. But
you need to know how to make the files, right?
One of the cool things about UNIX is that all files that you work on
are just plain text files. Have you ever written a paper in Word
version 5 and tried to open it in Word version 6? Sometimes it works,
sometimes it doesn't. It always works in UNIX, since the all of the
files are in plain text, and always will be. You can make fancier
things with other programs in UNIX, but most of the real work can
easily be done in plain text. You will write your homework in this
class in plain text (at least if you want me to grade it!), program in
Perl using plain text, edit microarray data files that are in plain
text, and so on. And you will do this all in one program that edits
plain text files. It's called emacs. There are a lot
of good text editors out there, but we will use emacs becasue it's
what I know and it's the best editor around. (If any of you are
vi people, you of course are free to use it. Then again, if
there are vi people out there, why are you in this class?)
Emacs will take some getting used to, but it shouldn't be that hard,
and plus there's no choice ;-)
emacs is your friend
[Here
is the emacs user guide table of contents. There is lots of useful
info in it.]
To make a new, empty text file in emacs, type:
[john@ccbtl11 john]$ emacs newfile.txt
A new, empty emacs window should pop up on your screen. Move your
cursor over the emacs window and type. Type anything. Your dog's name,
your favorite color, or your favorite muppet.
Now, find the "control" key (it may be CTRL on the keyboard). Push it,
and hold it down. Keep holding it down. Now, push X and S in that
order, without holding them down. Take your finger off of the CTRL
key. You just saved what you wrote in a file called
newfile.txt. Now hold down the CRTL key and hit X and then
C. You just closed the emacs window. Do a ls in the directory
to see the file you just made. These are the emacs commands that you
will use about 99% of the time. A longer list of emacs commands (but
by no means complete!) are:
Some emacs commands
| CTRL-X-S
| Save contents of file.
|
| CTRL-X-C
| Quit emacs.
|
| CTRL-X-F
| Open a new file without stopping and restarting emacs.
|
| CTRL-V
| Scroll down one page.
|
| ALT-V
| Scroll up one page.
|
| CTRL-A
| Move to the beginning of the line.
|
| CTRL-E
| Move to the end of the line.
|
| ALT-<
| Move to the beginning of the file. (note: hold down shift to use arrow key)
|
| ALT->
| Move to the end of the file. (note: hold down shift to use arrow key)
|
| CTRL-S
| Search for a string letter by letter. In other words, as you
type the word you are looking for, emacs finds the letters as you
type.
|
| CTRL-S, Return
| Plain old search for a string.
|
| CTRL-G
| Start command sequence over again. E.g., if you mess up by hitting CRTL-X-X by accident, hit CTRL-G to start over. CRTL-G is your friend. You can do it anytime.
|
There's many, many more emacs commands, but these are some of the most
useful. You can read the emacs tutorial anytime by using the Help tab
at the top of the emacs window (that's right, you can use your
mouse!).
A quick note on file naming conventions
You will notice that in the above example, I named the file
newfile.txt, not just newfile. This is for a
reason. The reason is that the file I created is just a text file
(hence the .txt extension). Proper usage of file extentions are
critical to sucessful organization in the UNIX environment and in
bioinformatics circles. They are not forced on you - you can name
files anything you want (well, almost anything. See the last
subsection of this section) - but if you do that I guarentee that you
will quickly get confused.
Here are some examples:
- Perl scripts end in the .pl extension
- Files with sequence in the Fasta format end in .fa or .fas or .fasta
- When I run the different forms of blast, I name the output file
so that it corresponds to the type of blast I ran, e.g.:
filename.blastn, filename.blastp, filename.blastx, filename.tblastx,
etc.
and so on ...
Moving files
Let's download a file and look at it in emacs. Get this file. Be sure to note which directory the
file gets saved into from the Netscape download. Make a new directory
below your home directory called yeast. Now, move your new
file into the yeast directory. This is done with the move
command mv. E.g.:
[john@ccbtl11 john]$ ls
chr04.fsa crap/ mp3/ newfile.txt rna/ taing/
[john@ccbtl11 john]$ mkdir yeast
[john@ccbtl11 john]$ mv chr04.fsa yeast
[john@ccbtl11 john]$ ls
crap/ mp3/ newfile.txt rna/ taing/ yeast/
[john@ccbtl11 john]$ ls yeast/
chr04.fsa
[john@ccbtl11 john]$
There is a copy command, too: cp. It's syntax is the same as mv, e.g.:
cp [file to move/copy] [directory to move/copy file to]
Now, go into the yeast directory and look at the chromosome 4 sequence. Pretty exciting, isn't it?
Removing files
To remove files and directories, use the 'remove' command rm, e.g.:
[john@ccbtl11 john]$ cd
[john@ccbtl11 john]$ ls
crap/ mp3/ newfile.txt rna/ taing/ yeast/
[john@ccbtl11 john]$ rm newfile.txt
rm: remove newfile.txt (yes/no)? y
[john@ccbtl11 john]$ ls
crap/ mp3/ rna/ taing/ yeast/
[john@ccbtl11 john]$
Now to remove the yeast directory, use the
rmdir command, e.g.:
[john@ccbtl11 john]$ rm -f yeast/chr04.fsa
[john@ccbtl11 john]$ rmdir yeast
[john@ccbtl11 john]$
Two things to note here. The first is that the -f flag is the
"force" flag -- the shell didn't ask whether or not I was sure I
wanted the file removed, it just did it. The second is that the
rmdir command only works on empty directories.
more or less
To look quickly at a file's contents without opening it in emacs, use
the commands more or less. These
quickly and crudely display the files contents in the terminal. You
can scroll though the file page by page using the spacebar. Press Q to
quit out of them and return to the shell prompt. less is
basically just a fancier more program; it allows you to scoll
though with up and down arrow keys. But less is not installed
on all UNIX computers. These programs are usefull when you just want
to check the contents of a file without editing it. (E.g., is my data
in the file important_file1.dat or important_file2.dat?)
Don't use spaces, exclamation points, question marks, ...
One more thing about file names in UNIX. Spaces are no good. Neither
are exclamation points, question marks, and all such things. Don't use
them. Use the underscore "_" or dash "-" instead. Basically just stick
to alphanumerics (a though z, A through Z, and 0 through 9),
underscores, dashes, and periods. I know it's kind of primitive, but
it's just the way it is.
Miscellaneous tips and tricks
Here's a bunch of things that don't fit neatly into a catagory but are
important and useful.
RTFM
Computer folks often use the acronym RTFM (for Read The F*#!ing
Manual) in response to stupid questions. It's not nice, but reading
the manual is often times a better way to learn that asking a
question. UNIX has what are called "man pages" for many (but not all)
of the common commands. If you want to learn more about ls,
for example, type man ls at the command line. Below is an
example of what a typical man page looks like. Try using some of the
options with the ls command to see what the output looks
like. (The -a and -l are the most commonly used
flags with ls.)
[john@ccbtl11 john]$ man ls
NAME
ls - list contents of directory
SYNOPSIS
/usr/bin/ls [ -aAbcCdfFgilLmnopqrRstux1 ] [ file ... ]
For each file that is a directory, ls lists the contents
of the directory; for each file that is an ordinary file,
ls repeats its name and any other information requested.
The output is sorted alphabetically by default. When no
argument is given, the current directory is listed. When
several arguments are given, the arguments are first sorted
appropriately, but file arguments appear before directories
and their contents.
The following options are supported:
-a List all entries, including those that begin with
a dot (.), which are normally not listed.
-A List all entries, including those that begin with
a dot (.), with the exception of the working
directory (.) and the parent directory (..).
-b Force printing of non-printable characters to be
in the octal \ddd notation.
-c Use time of last modification of the i-node (file
created, mode changed, and so forth) for sorting
(-t) or printing (-l or -n).
-C Multi-column output with entries sorted down the
columns. This is the default output format.
-d If an argument is a directory, list only its name
... and so on ...
Some other helpful sites are:
Intro to UNIX from Lincoln Stein's CSHL Genome Informatics course.
Intro to Unix commands from Indiana University.
It can also be helpful to do Google searches for UNIX tips. If
you are really into it, I can recommend a couple of good UNIX books.
Tab completion
Tab completion rules. Say you have five files in a directory called
really_really_important_data_wow_so_important1.dat
really_really_important_data_wow_so_important2.dat
really_really_important_data_wow_so_important3.dat
really_really_important_data_wow_so_important4.dat
really_really_important_data_wow_so_important5.dat
After cursing yourself for naming the files so stupidly, you need to
look though the files with less to find the data you
want. Except you don't want to keep typing
really_really_important_data_wow_so_important... everytime. Use tab
completion. It works like this. Type "r". Hit tab. The shell will finish typing the names of all the files that begin with "r", up until there is a character that isn't common to all the files. E.g.,
[john@ccbtl11 john]$ r
[Hit tab]
[john@ccbtl11 john]$ really_really_important_data_wow_so_important
[And the shell will type everything out the the numbers 1, 2, 3, and so on.]
Here you can hit tab two times in row quickly, and the shell will
give you all of the files that complete the match. This works in any
case. Try hitting tab twice at a blank command prompt. Cool, huh? Play
around with tab completion, it's a handy thing and second nature to
UNIX folks.
Cutting and pasting, UNIX style
In the Windows and Mac world, you can cut and paste type between
progams. You usually highlight what you want, go to the Edit tab at
the top of the program, select Copy, go to other program, put the
cursor where you want to paste the type, go to the Edit tab at the top
of the program and select Paste. You can do this in UNIX, too, but
there's a shortcut. Highlight the text you want to copy using the left
mouse button, select where you want the type to go, and hit the middle
mouse button. It'll paste it there.
Wildcards
You can use what is called a wildcard (the "*" astericks) when dealing
with lists of files. For example, if you were in a directory with say
100 files in it; some clustalw files, some blastn files, some blastx
files, and so on, and you were interested in only the blastx files in
the directory, you could simply do a ls and look by eye (this
example is from one of my directories, so don't laugh):
[john@ccbtl11 john]$ ls
0_1_all.aln
39_FP_1.fa
BOXSHADE.ps
ORNL_39.out
YHR054C_and_flank.fa
YHR054C_and_flank_0_1_2_4.aln
YHR054C_and_flank_0_1_2_4.dnd
YHR054C_and_flank_0_1_2_4.fa
YHR054C_and_flank_REV.fa
YHR054C_and_flank_REV_0_1.aln
YHR054C_and_flank_REV_0_1.dnd
YHR054C_and_flank_REV_0_1.fa
YHR054C_and_flank_REV_0_1_2_4.aln
YHR054C_and_flank_REV_0_1_2_4.dnd
YHR054C_and_flank_REV_0_1_2_4.fa
allyeast_v_39_FP_1.blastn
allyeast_v_cand_39_intergenic.blastn
cand_39_intergenic.fa
cand_39_intergenic_and_cup1.fa
cand_39_intergenic_and_cup1_intergenic.fa
cand_39_intergenic_and_cup1_intergenic_REV.fa
find_orfs
find_orfs_new_assem
new_others_v_cand_39_intergenic.blastn
new_others_v_cand_39_intergenic.blastn.aln
new_others_v_cand_39_intergenic.blastn.parsed
new_others_v_cand_39_intergenic_and_cup1_intergenic.blastn
new_others_v_cand_39_intergenic_and_cup1_intergenic.blastn.aln
new_others_v_cand_39_intergenic_and_cup1_intergenic.blastn.aln-only-top-two
new_others_v_cand_39_intergenic_and_cup1_intergenic.blastn.parsed
new_others_v_cand_39_intergenic_and_cup1_intergenic_handdone.blastn.parsed
new_others_v_cand_39_intergenic_and_cup1_intergenic_handdone.blastn.parsed.bak
new_v_39_inter_REV.aln
new_v_39_inter_REV.dnd
new_v_39_inter_REV.parsed
nt_v_cand_39_intergenic.blastn
nt_v_cand_39_intergenic.blastx
nt_v_cand_39_intergenic_and_cup1.blastn
others_v_YHR054C_and_flank.blastn
others_v_YHR054C_and_flank.blastn.aln
others_v_YHR054C_and_flank.blastn.dnd
others_v_YHR054C_and_flank.blastn.parsed
others_v_YHR054C_and_flank_REV.blastn
others_v_YHR054C_and_flank_REV.blastn.aln
others_v_YHR054C_and_flank_REV.blastn.dnd
others_v_YHR054C_and_flank_REV.blastn.parsed
others_v_cand_39_intergenic.blastn
others_v_cand_39_intergenic.blastn.aln
others_v_cand_39_intergenic.blastn.dnd
others_v_cand_39_intergenic.blastn.parsed
others_v_cand_39_intergenic_and_cup1.blastn
others_v_cand_39_intergenic_and_cup1_intergenic_REV.blastn
others_v_cand_39_intergenic_and_cup1_intergenic_REV.blastn.aln
others_v_cand_39_intergenic_and_cup1_intergenic_REV.blastn.dnd
others_v_cand_39_intergenic_and_cup1_intergenic_REV.blastn.parsed
others_v_intergenic_and_cup1.blastn
others_v_intergenic_and_cup1.blastn.aln
others_v_intergenic_and_cup1.blastn.dnd
others_v_intergenic_and_cup1.blastn.parsed
others_v_intergenic_and_cup1_intergenic.blastn
others_v_intergenic_and_cup1_intergenic.blastn.parsed
pombe_v_pred_RNA_39.blastn
pred_RNA_39.fa
new_others_v_cand_39_intergenic_and_cup1_intergenic.blastn
Which is a bit of a chore to look through, don't you think? Now, if I
wanted to look at only the blastx files,
[john@ccbtl11 john]$ ls *.blastx
nt_v_cand_39_intergenic.blastx
Which is a slightly more managable output. The wildcard is very
powerful; play with it a bit. It's also very dangereous when used in
the rm command. Be careful!
Putting processes in the background
Say you want to start emacs to edit a file but you want to run blast
at the command line, too. You could open a new terminal, and this is a
fine idea, but your desktop can get pretty crowded with windows. You
can put processes in the backround in UNIX. If you just type
[john@ccbtl11 john]$ emacs newfile.txt
you get your emacs window, but you don't get the command
line back. To get the command line back, append the "&" symbol to the
end of the command, e.g.:
[john@ccbtl11 john]$ emacs newfile.txt &
[john@ccbtl11 john]$
Viola! You have a emacs window and you have the shell prompt back to
keep on working. If you forget to add the ampersand to the end of the
command, you can get the shell prompt back by hitting CTRL-Z and then
typing bg to put it in the background.
Redirecting output
Now imagine you are blasting your favorite protein against Genbank,
and that your favorite protein is a kinase. You are gonna get a
helluva lot of hits, and unless you are the Greatest American Hero,
you ain't gonna be able to read the blast report as it scrolls past
your terminal at the speed of light. You need to direct the output
into a file to study at your leisure. You can do this with the ">"
symbol, e.g.:
[john@ccbtl11 john]$ blastp /data/blast-db/genbank my_kinase.fa E=0.001 -cpus=1 -filter=dust -wordmask=seg > genbank_v_mykinase.blastp &
[john@ccbtl11 john]$
This looks complicated, but it's pretty simple. I blasted the file
my_kinase.fa against the genbank database that was located in
/data/blast-db/. The flags E, cpus, filter, and wordmask are
things we will learn about later. The part of the command [ >
genbank_v_mykinase.blastp &] tells the shell to put the results of
the blast search into a file called genbank_v_mykinase.blastp
and to put the whole process in the background so you can keep working
while the blast job is running.
Command history
OK, now say you just typed in the long blastp command line argument
above (using tab completion of course), but that you accidentally
typed in my_kinase.f instead of my_kinase.fa. Blastp
will return an error. Instead of retyping the whole command, you can
use the arrow keys. If you hit the "up" or "down" arrow keys, the
shell will show you all of the commands you have typed in during this
session (sometimes, depending on how the system is set up, from
previous sessions, too). You can just edit the mistake you made using
the left, right, backspace, etc. keys (just add an "a" to your
filename in this case) rather than retyping the whole thing.
If you want to see what you have typed in recently, type
history at the command line. It will show you a numbered list
of the commands you have used. If you want to rerun a command and the
command arguments are complicated (as in the blast example above),
type "!" (usually called shebang) followed by the number of the
command line argument and the shell will rerun it.
Moving files to and from remote computers
Here is a website
with a list of free SSH and SCP clients for windows and mac machines.
There is two ways to move files from computer to computer in the UNIX
world, to bad way (using ftp), and the good way (using
scp). ftp stands for File Transfer Protocol, and is the
original file transfer program. It's OK for transferring files, but it
stinks security-wise. It sends your username and password in plain,
human readable text across the network so that any punk 13 year old
kid in the Netherlands can dip into the network traffic, get your
username and password, and then bring down the Genetics department
computers. The Officially-sanctioned Bio5488 Way to transfer files is
with scp, or Secure CoPy. All of the computers that we use
should have scp installed. scp's syntax is simple to use;
it's very similar to the cp command, e.g.:
[john@ccbtl11 john]$ scp somefile.txt john@warlord.wustl.edu:~/somefile.txt
john@warlord's password:
somefile.txt 100% |*****************************| 256 00:00
[john@ccbtl11 john]$
In this case, I am transferring the file somefile.txt to the
computer warlord in my home directory (indicated by the
tilde). The colon after the remote computer's name is important - it
tells scp that we are indeed transferring this file to a
remote computer.
If I were transferring the file from a remote computer to the local
machine, we would do something like this:
[john@ccbtl11 john]$ scp john@warlord.wustl.edu:~/perl/coolprog.pl .
john@warlord's password:
coolprog.pl 100% |*****************************| 196585 00:01
[john@ccbtl11 john]$
Here I have told scp that we want the file called
somefile.txt fromthe perl directory (which itself is
in my home directory) to be placed in the current directory, indicated
by the "." (which always means the current directory).
Permissions
Permissions are a very important part of the UNIX
environment. Permissions are the rights that you and others have on
files and directories in the UNIX filesystem. For example, a file that
contains the website for this course will have much different
permissions than the file the contains the scores for your
homeworks. You can give and take away the rights to read, write, and
execute the files and directories under your home directory. Here are
the three types of permissions and what they enable one to do to files
and directories:
- r -- read files; able to ls contents of directories
- w -- write to files; able to create or delete files in directories
- x -- execute files; able to cd into directories
These permissions can be granted to four different kinds of users:
- u -- you alone, the user
- g -- your group, i.e. the people in this class
- o -- other users not in the group
- a -- all others, i.e. the rest of the world
To look at the permissions on a file or directory, use the ls
-l command. E.g.:
[john@ccbtl11 john]$ cd taing
[john@ccbtl11 taing]$ls -l
total 4
drwx------ 2 john bio5488 512 Dec 20 10:41 hw/
drwxr-xr-x 2 john bio5488 512 Dec 20 10:41 perl/
[john@ccbtl11 john]$
You can see the two directories hw and perl listed
there, along with a bunch of other info. The important parts are
noted here:
flags owner group world user group date modified
d rwx --- --- 2 john bio5488 512 Dec 20 10:41 hw/
d rwx r-x r-x 2 john bio5488 512 Dec 20 10:41 perl/
To change the permissions on files and directories, use the
chmod command, e.g.:
[john@ccbtl11 taing]$ chmod a+w perl/
[john@ccbtl11 taing]$ ls -l
total 4
drwx------ 2 john bio5488 512 Dec 20 10:41 hw/
drwxrwxrwx 2 john bio5488 512 Dec 20 10:41 perl/
Which says to give the entire world the right make and delete files in
the perl directory (bad idea). Let's undo that:
[john@ccbtl11 taing]$ chmod og-wr perl/
total 4
drwx------ 2 john bio5488 512 Dec 20 10:41 hw/
drwxr--r-- 2 john bio5488 512 Dec 20 10:41 perl/
[john@ccbtl11 john]$
Better. Now everyone can ls the contents of perl,
but no one but you can do anything with the files.
Pipes (the "|" symbol)
Pipes allow you to feed the output of one command into another. The
most common time to use this feature for beginners (that includes me)
is to feed the output of a ls command into the programs
more or less in directories with so many files that
the ls command cannot fit them into a single shell
window. So, for example, if you were to do a ls in the
directory used in the Wildcard section above in a smallish shell
window, the resluts would just fly by you. Use the pipe:
[john@ccbtl11 john]$ ls | less
and you can scroll through the ls output at your leisure.
Regular Expressions using grep
Regular expressions (regexps) are one of the reasons that UNIX and
Perl are so widely used in genomics and computational biology. They
take a few minutes to get used to, but after that most people quickly
become hooked on them. What follows is a general introduction to
regexps; we will see these in Perl a lot; we are just
introducing them here.
Regexps are used in many different UNIX programs such as
grep, emacs, and in the programming language
Perl. They are basically a shorthand way of describing a set of
strings without having to fully describe all of the strings (or a set
of substrings you are interested in) in the set. ("String" is a term
that will come up a lot - it's just a computer-sciency term for "list
of characters". A sentance is a string. So is a list of
numbers. Basically, anything you write or read in an emacs window
could be described as a string.) For example, say you had a file of
all the predicted proteins in the human. There would be on the order
of 30,000 ORFs listed. In this case, we could describe a string as one
of the 30,000 entries for the set of ORFs, where each entry (or
string) contained the ORF name, ORF sequence, ORF molecular weight,
and predicted ORF function. You could describe the entire set of
entries as 30,000 strings, or you could use regexps as shorthand to
describe a subset of entries that you are interested in.
Here is a toy example of a file similar to what I described above:
prot01 MKGLRWTYQSDCALA 1650 Unknown
xvr33 MALCPCPCPCDGR 1430 Copper binding
.
.
.
prot30000 MQDALA 660 Unknown
Now lets use grep to parse through this file.
[john@ccbtl11 john]$ grep Unknown human.file
prot01 MKGLRWTYQSDCALA 1650 Unknown
prot30000 MQDALA 660 Unknown
[john@ccbtl11 john]$
It retuned the lines that contained the word "Unknown". That's pretty
straight forward.
Spelling mistakes? Something not clear? E-mail Christina at chen@genetics.wustl.edu
Last updated: Friday Jan 9, 2004
Created by John McCutcheon, 2003