You've decided to rotate in Professor Gene O'myc's lab who studies the green sulfur bacterium Cholorbium tepidum. In particular he wants you to find genes that are involved with nitrogen fixation. His lab is working out a pathway for nitrogen fixation which seems to be missing some important components. He believes that using comparative genomics you will be able to create a list of candidate genes involved in nitrogen fixation. Your program director feels that graduate students should do at least 20 rotations in their first year, so that leaves you 8 days to create a candidate list, Good Luck!
Luckily for you, C. Tepidum has been fully sequenced. You can find its entire proteome
at this link. There have also been three other nitrogen fixing
prokaryotes sequenced:
Methanococcus maripaludis
Rhodopseudomonas palustris
Bradyrhizobium japonicum
Here is a rough tree of the four nitrogen fixing microbes:
Question 1: If you were only going to use one of these species, which comparison with C. Tepidum should be the most useful for your problem? Why?
Using only protein sequences, use any method you want (including perl scripts from previous assignments) to create a list of less than 100 genes that may be involved in Nitrogen fixation in C. Tepidum.
Here is a list of genes known to be involved in nitorgen fixation in C. Tepidum.
Question 2: How many known genes are in your list? Assuming these 12 genes are the only genes involved in Nitrogen Fixation, what is you false negative (fraction of true genes you missed)?
Question 3: If you had more time, how could you improve your candidate list?
Turn in any scripts you use as well as an outline of your method and the list of your candidate genes.