Hello all,
I have question, sorry in advance if it is so simple but I am just slightly puzzled. I am trying to use LD pruned set of SNPs for my population genomics analysis. I am using SNPrelate and directly parse the vcf file into the package. For the species that I am working with, LD decays quite rapidly, with a big drop off even in the first kb, and a plateau nearly reached after 10 kb. When I first ran the SNPrelate to generate LD pruned set of SNPs, I set up the parameters like this :
snpset_pruned <- snpgdsLDpruning(genofile,ld.threshold=0.2, win.size = 50000, maf = 0.03, autosome.only=F)
When I got the LD pruned set and estimated LD, still I found so many SNPs with high correlation values. I do not know why this is happening considering LD decays decays rapidly? I contacted the SNPrelate developer and he suggested I should increase the size of sliding windows so I changed my code to this:
snpset_pruned <- snpgdsLDpruning(genofile,ld.threshold=0.2, win.size =2000000, maf = 0.03, autosome.only=F)
I am even more puzzled now, because although the size of window is quite large, I still find some SNPs with high correlation values.
Could someone tell me why this is happening (long range LD)? or whether I am misunderstanding something here? Thanks