Hi Biostar!
After coming across a few papers claiming to find long range LD between SNPs on different chromosomes, I decided it would be fun to do this on my own dataset.
The calculation itself is not terribly difficult, but a bit bothersome due to the huge RAM requirements.
I have a an already filtered dataset with SNP data as a 01-matrix for every chromosome for several populations. I would like to calculate LD (either D' or r2) between every pair of SNP, so obviously a huge amount of comparisons (in the millions at the very least).
The plan is then to apply some stringent threshold to that output to only get SNPs in significant LD (and then compare that across populations).
I'm aware of several programs for calculating LD, but they seem to be mainly focused on LD-decay within each chromosome, and not between. I'm also worried they can't handle this amount of data.
Thanks in advance for any help!
Can you get your SNPs into VCF format by any chance?