Assignment 5: Writing your own mass spec software and an adventure in scientific collaboration

You recently received an email from the genomics guru Craig Venter asking (no, begging) you to collaborate with him on some mass spectrometry experiments his institute is undertaking (apparently the word is out that you are taking the revolutionary Introduction to Genomics course at WashU). His group has engineered an organism that is apparently reliant on 10 synthetic proteins that they have inserted into the organism's genome. (They have named the organism Ego maximus.) However, it's unclear whether or not these proteins are expressed becasue they can't visualize them by PAGE. They are going to try to detect these 10 proteins in E. maximus by the standard tryptic digest/LC/ES ionization/MS/MS experiments that you learned about this week.

Craig has sent you a file that contains the 10 E. maximus protein sequencess. He wants you to write a perl script that takes in a file with fasta-formated protein sequences and outputs:

... OK, now a few weeks have passed, and your are feeling pretty good about yourself. But then you get an frantic email from your buddy Craig saying that you are a poor bioinformatician because the list of predicted trypsin fragments you sent him doesn't quite match the data they see in their experiments. Worried, you ask him to send you the data. He sends you a list of fragment masses (i.e., not the raw M/Z values) that they got for one of the ten proteins (i.e., from a LC/MS experiment, not a LC/MS/MS experiment). Click here for the file of the observed masses.

Question 1: Which protein is it? Can you suggest an alternate explanation (other than you being a crappy coder)? Like something rooted in protein biochemistry?

Question 2: The second trypsin peptide in protein 1 is SCHLLR, predict what you would actually see in the real MS experiment. That is, write down the values for the two most prevelant M/Z values you would expect to see for the whole peptide and for the entire daughter ion spectrum.


Your scripts and answers are due Friday, Feb 20th, 2009 at midnight. Enjoy.