Perl Crash Course 2005

This is a learn-by-example crash course in Perl. Reading this will not make you a seasoned perl hacker, but we hope that this primer combined with the discussion section will get you to the point that you can start getting things done with perl.

To really learn perl, you will need to do a lot of studying on your own. That's why we recommend you buy Learning Perl.

When you want to look at some real programs, we have written two simple ones:
numbers.pl
grep.pl

Here are two more useful, but more complicated programs. They are overly documented to make sure every line makes sense. You should probably open these files in emacs, not your browser. Right click and select “save target as”. You will need to change the file names from .txt to .pl to run them and view the syntax highlighting in Emacs.

ReadTabDelimited.pl    file1

ReadFastaFile.pl    fastafile1

 

What is Perl?

According to the inventor, Larry Wall, Perl stands for Pathalogically Eclectic Rubbish Lister. I'm not sure what this means, but let it suffice to say that Perl is the language of choice for writing quick and dirty programs to manage large text files (such as microarray data and DNA sequences). It's more powerful than the Linux commands you've learned, but is faster to write than other programming languages. To use Perl you will write scripts, which are simply text files. They contain special instructions so that the script is processed by something called the Perl Interpreter. This interpreter converts the script into a language the machine can understand and runs the program. So, one way to think of your script is as instructions for the machine. However, anyone who has seen Perl return them a funny result (in other words, anyone who has used Perl) may tell you that they are actually just suggestions for the machine.

Step 1: Running a Perl program.

It's important to know where the Perl Interpreter resides. To learn this, type:

[john@ccbtl11 john]$ which perl
/usr/bin/perl
[john@ccbtl11 john]$

at the command line. On this computer, the interpreter resides at /usr/bin/perl. Now we basically have all the info we need to run our first perl program. Open emacs (what is emacs?) and create a file called first_script.pl. Then type:

#!/usr/bin/perl

#! is the instruction to tell the Perl Interpreter in /usr/bin to process the rest of the file.

Strict

One option in a perl program is the “Strict” option. “Strict” enforces additional rules on when and where you can create different variables. While these rules may seem like a pain, they will prevent you from doing stupid things (like creating two variables with the same name and erasing important data).

You should add

use strict;

to the top of your program. The strict option requires you to use the keyword “my” when declaring a variable (See below). This defines the scope of the variable. For now just add the my, it will eventually become clear what “the scope of the variable” means.

Step 2: Variables

Scalars

To make a Perl script actually do something, you have to give it data to work with. This data is stored in the form of variables. The simplest variable, the scalar, contains one piece of data. A scalar is denoted by a $ sign. For example, type:

my $number = 5;

There are two things to notice about this line. First, the = sign. This is an operator that tells the script that $number contains the value to its right. Second, the semicolon tells Perl that this is a complete statement. Leaving out semicolons is one of the most common reasons that your scripts won't run the first time.

Now let's have Perl perform a simple calculation:

my $number = 5;
my $another = 6;
my $sum = $number + $another;

Now the perl script takes the values of $number and $another and adds them, storing this new value in $sum. Unfortunately, you haven't told the program to do anything with this data, so it just gets thrown away. Type:

print "The sum is:\n$sum\n\n";

The print function tells the program to print the whatever follows it up to the semicolon. The double quotes tells it to print the value for $sum. \n is a double quote backslash escape that tells the program to print a newline. Read more about double and single quoted literals and backslash escapes in Learning Perl.

Arrays

Arrays are lists of items indexed by numbers. A scalar holds one piece of data, but an array can hold many in a list form. They are designated by the @ symbol (such as @array, while each item in the list is denoted by a $ and a number, beginning with zero (such as $array[0]). Lists can be assigned to arrays in several ways, one of which is shown below:

#!/usr/bin/perl
 
my @Jetsons = qw/George Jane Judy Elroy/;
print "Meet the Jetsons! \nFather:  $Jetsons[0] \nSon:  $Jetsons[3] \n";

The result will be:

[john@ccbtl11 john]$ ./array.pl
Meet the Jetsons!
Father:  George
Son:  Elroy
[john@ccbtl11 john]$

Hashes

Hashes are like arrays, but each element is indexed by a name. They are denoted by the % sign. Each element is called a value, and its index is called a key. Elements are denoted by the syntax $hash_name{"key"}. Using a hash, you can list the names of the Jetsons as values, with their places in the family as keys. For example:

#!/usr/bin/perl
 
my %Jetsons = (
        "Father" => "George",
        "Mother" => "Jane",
        "Daugher" => "Judy",
        "Son" => "Elroy"
        );
 
print "\nMeet $Jetsons{Father} Jetson. $Jetsons{Mother}, his wife!\n\n";

Notice that the syntax for assigning hashes is different than that for arrays ("key" => "value" instead of a list separated by commas).
The output is:

 
[john@ccbtl11 john]$ ./hash.pl
Meet George Jetson. Jane, his wife!
 
[john@ccbtl11 john]$

Step 3: Input and Output

A program is useless if it doesn't interact with the user. It has to be able to read in data and perform operations on the inputs. After calculations are done, the program needs to either show the results on screen or save them in the desired directory for future retrieval. The following 4 sections will give you a glimpse into how to deal with input and output issues in Perl. Perl is preferred over other languages because of its ease in dealing with these issues. Keep in mind that there are many other ways to do the same thing in Perl. So if you are more comfortable writing input routines in a particular way or if you find a simpler way to do what I am going to show you below, feel free to stick to it.

1.      Read from Command Line

Hopefully by now you know what "command line" is. Let's say you want to get the sum of 2 numbers the user enters on command line. So the user types in the following:

[john@ccbtl11 john]perl add.pl 10 8 As a safety precaution (because I don't trust users. period), I always check the number of arguments first.

if($#ARGV != 1) 
{
  exit;
}

As a default, Perl puts the command line arguments in an array named @ARGV. Putting $# in front of any array gives you the last index of the array. So $#ARGV gives the last index of the command line inputs. Perl does not count the script name as an argument. So in this example, the size of the array is 2 and the last index is 1. If you want to keep the name of the script somewhere in the program, you can call it $0 (a $ with a zero). exit ends the program if the array size is not correct.

If the number of arguments is correct, we can read them in.

$num1=$ARGV[0];
$num2=$ARGV[1];
$total=$num1+$num2;

and so on. See below for printing the results onto standard output

Other fancier techniques will allow user to enter variable lengths of arguments and then use while-loop to assign values.

Note that the above piece of code works fine if the user enters strings instead of real numbers. If you are interested, try it out and see.
[john@ccbtl11 john]perl add.pl hahaha 10

2.      Read from File

Reading from files works almost the same way. I still recommend you to check the argument (ie, whether an argument for the file names has been given by the user. You don't need to check whether that file exists as I am going to show you next). For now, let's assume the user has entered a file name:

[john@ccbttl11 john]perl add.pl number.txt

Let's assume (yes, I make a lot of assumptions) number.txt has the following format:

3
4
5
8
9

This is my routine for reading in the file:

open(infile,$ARGV[0]) || die("file $ARGV[0] does not exist");
while($line=<infile>)
{
  chomp($line);
  push @number, $line;
}
close(infile);

open is the command you use to open a file.  infile is a file handle, which for now we can consider as just another variable containing the file being read. $ARGV[0] contains our file name. || means 'or'. die tells the program to terminate and prints the statement in quotation on screen. It is a good practice to do that so you know why and what files cause problems. So in summary, the first statement literally tells the program to "open this file, if not possible, just DIE". (now you see why I like Perl? very simple to code up).

while-loop is a structure that executes the code in {} over and over again until the conditions in () is not true anymore. $line= assigns each line in input file to the variable $line. After each while-loop, the next line is assinged to $line (which overrides whatever value $line holds). chomp is a very useful command as it removes the "enter" characters at the end of the line. When you are coding under Windows and transferring the file to Unix, extra characters (ie, ^M) get added. Having this line will save you time in debugging. @number is an array I created earlier in the program and push just adds whatever value $line is holding to the array.

close terminates the file handles. It is good practice to do so since in the real world, multiple programs may run at the same time and need access to the same file.

3.      Write to Screen

Similar to any other programming languages, the simplest command for this purpose is print. Some common options are \t and \n. \t puts in tabs whereas \n puts in a newline. You can also use the dot operator to join multiple statements using a single print. IE:

print "It is a good day, "."but it is long.\n"; This is the same as

print "It is a good day, but it is long\n";

4.      Write to File

The general structure is similar to that of reading in a file. Let's say you want to save the following array in a file called "people.txt":

@names=("John","Mike", "Bob");

You want the format in people.txt be :

John     Mike    Bob

This is what you would do:

open(outfile, ">people.txt") || die("Cannot create people.txt");
foreach $element(@names)
{
  print outfile $element."\t";
}
close(outfile);

> is an operator that tells Perl to write to the file. >> adds on to the file so be careful about which one of these you should use.  foreach-loop is another useful structure. Here it loops from the first element of a given array, @names, to the last element. Now the only difference between the print statment here and those mentioned so far is outfile is inserted before the items needed to be printed. It tells Perl that instead of the standard output (which is the screen), print to this file handle.

Things to put in your programs

#1: Comments
Comments are lines of a script that begin with the # symbol. Except for the first line (#!), these lines are not read by the Perl Interpreter. Comments are good. They help others see what your intentions are and they help you figure out what your intentions were when you look at your code 3 months later (trust us this is important). Please use comments liberally. Homework assignments that are not commented well will not be given full credit.

#2: Indentation
It is important to indent your code so that yourself and others will be able to easily follow the program through the steps it takes. There is no particular required indentation style as long as you are consistent. One format is shown below:

while ($scalar eq true) {
        do something;
        do something else;
        and so on;
        if ($scalar_2 == $scalar_1)
               do this;
               do that;
        }
}

Debugging "Techniques"

I know I know. When a program that you've worked on for 3 hours just doesn't work for no reason, you just want to smash the computer and call it quit. The followings are a few tips that might just save you from doing that: (Needless to say, I learned from painful experiences)

1.      Check that semi-colon

More often than not, you might have an extra semi-colon somewhere or forgot to put one.

2.      Check matching brackets

Emacs usualy does a good job at checking matching brackets. But just in case, you might want to check whether you put the desired codes in the same group

3.      Check double and single quotations

In Unix and Perl, double and single quotations have different meanings. Be sure to use the right kind of quotation marks

4.      Check == and =

== means 'equal to' while = means 'assigned a value of'. Be sure to use the right one, especially when you are doing calculations and comparisons in the same program

5.      Check initial values

When you do calculations on a variable, it is usually wise to initialize it to 0 because Perl will not warn you about uninitialized values. Instead, Perl will use whatever is stored in that memory location as a starting value. This could be the reason you are getting strange results!

6.      Use the Perl debugger!!!!!

      Debuggers are a great thing. They allow you to walk line by line through your code, and print out the values        of variables as you go. You can see exactly what is going on. The debugger is pretty intuitive to use. Just add   –d to your command line (e.g. perl –d myprogram.pl argument1 argument2 …). You’ll see the first line of         your code. Type h to see all of the command. You’ll probably only need “n”, “s”, “c”, “b”, and “p”.

            Hopefully these tips will help you get "unstuck" in programming!

 

Other Resources

A great place to learn about perl and unix is the Cold Spring Harbor Course Genome Informatics, taught by Suzanna Lewis, Lincoln Stein, and others. This website has great introductory lectures on all things Unix and Perl. http://stein.cshl.org/genome_informatics/

If you’re feeling like you’ve mastered the basics and want to get fancy, BioPerl is a public resource for biology specific perl code. There are libraries of functions for parsing BLAST output, downloading data from NCBI, and many other fun bioinformatics/genomics tasks. I don’t recommend trying to use this until you understand Perl modules, object oriented programming, and have a lot of free time to read the documentation.


Spelling mistakes? Something not clear? E-mail Scott at sdoniger@wustl.edu
or Jay at jgertz@artsci.wustl.edu
Last updated:
Wednesday January 19, 2004