Hi all,
I have a SNPs file contains the allele counts across populations of each SNP are represented by two lines in the file, with the counts of allele one on the first line and the counts for second allele on the second (more precisely each column corresponds to one population and the number of rows are twice the number of SNPs because each pair of numbers corresponds to each allele)
1 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0
0 2 0 0 0 0 0 0 0 0 0 4 0 2 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 2 0 0 2 1 0 0 0 0 2 4 0 0 0 2 0
The counts of allele 1 and allele 2 are assumed to sum to the sample size typed at this SNP in this population. instead of population allele count how can I get population allele frequency for each allele? I actually know how to do it in R but my file is so big (nearly 10G), so I am already stuck! I am learning perl but have not figured out yet how to do that Any help is sincerely appreciated.
Hi @Kevin Blighr. I am not quite sure about your solution. For instance if I take only the first and second row from the first column which are allele counts for alleles A and a at locus X1 in the first population then I would calculate allele frequency like this for allele A in population 1
. This will be equal to 1 for allele A and 0 for allele a. In addition missing genotypes should be accounted as well!
Hi Ana, okay, I think that I understand it now:
This will print the minor allele frequency for each SNP (1 row per SNP):
This prints the major allele frequencies:
Hope that this helps!!!
Hi @Kevin Blighe, Thank you so much for your incredible help.... I think there is still a problem and I cannot use your new codes. But instead of calculating minor and major allele frequency, how can I calculate the frequency of allele 1 at each locus at odd row and allele 2 at even row to get something like below, this will solve my problem. Can you help me with that? Thanks a lot
Maybe this is what you mean?
yesss, this is exactly what I want. thanks a lot.
Phewww! Thanks! Goodnight :)