Hello Everyone,
I would appreciate the opinions of anyone with experience running XPEHH. I have run this haplotype based selection test on phased whole genome SNP data to compare two recently split non-human populations. I am trying to detect regions of population specific selection.
I expected to see regions with a peak XPEHH score flanked by a decay in the score as the linkage breaks down. However occasionally I see very sharp peaks in the XPEHH score of only a MB in length (eg the peak at about 86Mb in the figure linked to below)
http://postimg.org/image/4ns7d9wyb/
Do people have suggestions about how to interpret these sharp peaks? My first thought is that it is the result of some kind of error in the SNP calling. Maybe it is from a population specific recombination hotspot but the populations only split ~100 generations ago so this seems unlikely. Any thoughts or questions are welcome regarding how to interpret such a plot when looking for selection with XPEHH.
Thanks in advance for your help,
Best regards,
Rubal
Popn1 is fixed for A allele and Popn2 has A allele at 15% freq.
I guess that for the neighbor SNPs, the situation is the opposite: Pop1 has low frequency for the Minor Alleles, while pop2 has higher frequencies. I think that the peak is due to a problem with the definition of which is the Minor Allele in one of the populations. Consider that if you used the other allele as the Minor Allele, that SNP would have a score of about -1.5, and the peak would not look so isolated.
Ah good point, I will look into that. This makes a lot of sense. As the XPEHH is a haplotype based test shouldn't it avoid problems from this kind of inappropriate labeling?