I have SNPs information for 500 individuals (samples). Given a SNP, for each individual is available the number of minor alleles measured in that sample (i.e 0,1 or 2). I need to calculate the linkage disequilibrium between each pair of SNPs. I've found this formula:
where A (or B) and a (or b) are the two possible alleles at one locus. P(xy) denotes the frequency of observing x and y together in the same haplotype, P(x) denotes the frequency of x.
I defined the frequencies as:
where #(0) is the number of individuals with a value of 0 for the specific SNP. Is it the right way to calculate the linkage disequilibrium using my available data?
You can find some info here, included formulas to calculate r^2: discrepancy between LD and R-sq I also suggest you read the paper referenced therein, it's very interesting!