Hello Everyone,
I am looking for the best way to get linkage information from unphased whole genome population data. I have a vcf file with multiple individuals from different populations. The data is unphased but I would like to detect regions with an excess of linkage disequilibrium as a measure of positive selection. I have not phased the data because I have a limited number of individuals per population of a non-model species and therefore worry that phasing will be very inaccurate.
What do people think would be the best way to detect regions with high levels of linkage disequilibrium? I was thinking something like VCFtools --geno-r2
option might be suitable.
Thanks for your help!
Best regards,
Rubal
That sounds like a promising tool I will give it a go. It mentions that it will give slightly different results each time due to the stochastic search. Would you recommend a multiple iterations approach? Also is there an option for specifying window sizes, or would you do post-hoc averaging of scores across sites for windows? Thanks very much
Running it several times will allow you to generate a confidence interval around the XP-EHH score. Window size is determined by the number of SNPs required for EHH to decay to 0.05 and isn't specified by the user.