Hello, i have a data set with the dosage data (between 0-2) from a couple million SNPs, i would like to get the MAF for each SNP. I saw somewhere (not that reliable place) that you can get it just doing:
SNP1 <- c(0.03, 0.05, 1.95, 1.21, 0.09)
MAFSNP1 <- sum(SNP1) / (2*length(SNP1))
i compared this "MAF" from my dataset with the 1000 genome one, and it match.
Do you know if this is the good way to get the MAF in dosage data? do you know a paper or book giving the formula? thanks a lot !
Why not use plink to calculate the frequency? It would be a lot faster than R.