Snp Frequency In Complete Genomics Data
2
1
Entering edit mode
11.9 years ago
Pappu ★ 2.1k

I am trying to analyze SNP frequency in 69 complete genomics data for example:

2951574 chr2 85624896 85624897 snp C G dbsnp.132:rs113793303 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 01 00 00 00 00 01 00 01 00 00 00 01 01 00 00 00

So I took from 11-79 column and counted frequencies of 00, 01/10 and 11 with a python script. So the frequency for C will be (numberof00 + numberof01/2 + numberof10/2)/69 =64/69 = 0.94 Is it correct?

python • 2.5k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
2
Entering edit mode
11.9 years ago

Yeah the frequencies looks right. But you could really just add up the number of 0's and 1's. Something like this:

data = line.strip().split()
alleleA = data[5]
alleleB = data[6]
#it looks like your allele information starts at column 9?
alleleString = ''.join(data[8:])
ACount = float(alleleString.count('0'))
BCount = float(alleleString.count('1'))
print alleleA + ' frequency: ' + str(ACount / len(alleleString))
print alleleB + ' frequency: ' + str(BCount / len(alleleString))
ADD COMMENT
0
Entering edit mode

Thanks, I was counting 01, 10, 11, 00 separately!

ADD REPLY
1
Entering edit mode
11.9 years ago
Peixe ▴ 660

You could give a look to the command --get-INFO <string> from the package vcftools.

Substituting the population in the <string> field, you can retrieve the frequencies for the ALTERNATE allele in the specified population. For example, for Europeans, substitute it for EUR_AF, and so on...

Its very straightforward! ;)

ADD COMMENT

Login before adding your answer.

Traffic: 2118 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6