Assignment 9 solution

  1. perl code for Part 1

  2. Count_freq.pl
  3. SNP Table from Part 1
  4. 	Position	Base1	Base2
    	14	a	t
    	19	a	g
    	25	t	c
    	50	t	g
    	53	g	c
    	60	a	t
    	80	t	c
    	91	t	c
    	93	c	g
    	129	a	g
    	144	a	g
    	148	a	c
    
  5. Perl code for Part 2

  6. Cal_LD.pl
  7. LD Matrix from Part 2
  8. 	14	19	25	50	53	60	80	91	93	129	144	148	
    14		0.2304	0.0784	0.036	-0.0784	0.056	-0.0784	-0.00720000000000001	0.00719999999999998	0.0456	-0.00720000000000001	0.0456	
    19			0.0784	0.036	-0.0784	0.056	-0.0784	-0.00720000000000001	0.00719999999999998	0.0456	-0.00720000000000001	0.0456	
    25				0.036	-0.2464	-0.104	-0.2464	0.00879999999999997	-0.00879999999999997	0.0376	0.00879999999999997	0.0376	
    50					-0.036	0.02	-0.036	-0.028	0.028	-0.036	-0.028	-0.036	
    53						0.104	0.2464	-0.0088	0.0088	-0.0376	-0.0088	-0.0376	
    60							0.104	0.012	-0.012	-0.036	0.012	-0.036	
    80								-0.0088	0.0088	-0.0376	-0.0088	-0.0376	
    91									-0.2496	-0.0608	0.2496	-0.0608	
    93										0.0608	-0.2496	0.0608	
    129											-0.0608	0.2484	
    144												-0.0608	
    148													
    
  9. Chi-square Matrix from Part 2
  10. 	14	19	25	50	53	60	80	91	93	129	144	148	
    14	0.00	50.00	5.41	1.17	5.41	2.84	5.41	0.05	0.05	1.82	0.05	1.82	
    19	0.00	0.00	5.41	1.17	5.41	2.84	5.41	0.05	0.05	1.82	0.05	1.82	
    25	0.00	0.00	0.00	1.10	50.00	9.15	50.00	0.06	0.06	1.15	0.06	1.15	
    50	0.00	0.00	0.00	0.00	1.10	0.35	1.10	0.65	0.65	1.09	0.65	1.09	
    53	0.00	0.00	0.00	0.00	0.00	9.15	50.00	0.06	0.06	1.15	0.06	1.15	
    60	0.00	0.00	0.00	0.00	0.00	0.00	9.15	0.12	0.12	1.09	0.12	1.09	
    80	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.06	0.06	1.15	0.06	1.15	
    91	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	50.00	2.98	50.00	2.98	
    93	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	2.98	50.00	2.98	
    129	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	2.98	50.00	
    144	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	2.98	
    148	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	
    
  11. Haplotype Definition

  12. There are only 2 haplotype blocks:
    block 1: snps 14, 19, 25, 50, 53, 60, 80
    block 2: snps 91, 93, 129, 144, 148

    From the Chi-square table, snp 14 is in significant LD with snps 19, 25, 53, and 80. snp 19 is in significant LD with snps 25, 53, 80. snp 25 is in significant LD with snps 53, 60, and 80. ETC. If you draw the snps out linearly on a piece paper and put in the connections between them, you will see that snp 25 is actually in significant LD with snp 60. So there is one long block and not two (most of you pointed out that snps 14-25 are on 1 block while snps 53 to 80 are on another). A common definition of a block is that most snps on that block are in high LD with each other. It doesn't necessarily required EVERY snp to be in LD. We accepted any answer with reasonable criteria for block selection.

  13. Part 3

  14. A criteria some of you suggested is that one should select a snp that's in greatest LD with others on a block. However, the most important thing one should look at is the NUMBER of other SNPs on a given block a particular SNP is in LD with. So snp 50 is definitely not a good choice for the first block. It doesn't appear to be in LD with any other snps on the block. Snps like this can arise because of recent mutations or population stratification.