Hi everyone:
I've been working with a genome-wide data set and want to do a LD decay plot. I've read that lot of authors use the --r2 command implemented in plink to do this. However I've been unable to achieve it.
I am using the LWK hapmap3 population to do some testing, here is my log file (the commands are used)
Options in effect: --noweb --bfile lwk --r2 --ld-window-kb 70 --ld-window-r2 0 --ld-window 20 --out lwk_70kb_20spns
But when I estimate the r2 average for every 1kb distance until 70kb I keep getting r2 values as high as 0.29. This is very strange because the LWK is an african population and the r2 estimates should have been very low by that distance.
Here is a screenshot of my console in R showing the r2 average for every 1kb until 70 kb.
http://s21.postimg.org/xag70922v/Screen_Shot_2013_10_26_at_2_30_18_AM.png
I have been trying with different --ld-window numbers but when I use more than 20 snps I get r2 averages as high as 0.51 for the 70kb distance, which is even more strange.
Thank you all in advance.
Javier
Can you do some reality checking on the actual data you're using to get a rough idea of the expected correlation? E.g. take a few SNPs at various distances and calculate the correlation in genotypes. Always possible that the code is working as it should but your data are somehow hosed. Genotypes could be encoded incorrectly, etc.
Ok, I will try that, but given that I use a Hapmap3 population I don't think that is the code that is affecting my r2 estimations. I think it has something to do with my the pruning parameters that I used before. Because both the pruning and the r2 estimations are based on the same thing...
Ok, I will try that, but given that I use a Hapmap3 population I don't think that is the code that is affecting my r2 estimations. I think it has something to do with my the pruning parameters that I used before. Because both the pruning and the r2 estimations are based on the same thing...