Genotype Frequencies Calculation
2
1
Entering edit mode
11.3 years ago
Peixe ▴ 660

Hi there,

Could anyone show me a straightforward method to retrieve genotypic frequencies from a tped or vcf?

I mean, not the expected frequencies calculated assuming Hardy Weinberg equilibrium from the allelic frequencies, but the real genotypic frequencies. I found a way using plink's --hardy option, which gives you the genotype counts, amongst many other stuff, and from these counts retrieve the frequencies. But I was wondering if a more simple way, analog the --freq option from plink or vcftools for the allelic frequencies. I know this may be a silly question, but had not found anything.

Thanks in advance,

P.

genotype vcf plink • 7.8k views
ADD COMMENT
0
Entering edit mode

I don't understand the problem, please clarify. As --freq with --counts would give you counts? Also, try --model option, it gives all sorts of counts, too.

ADD REPLY
0
Entering edit mode

I think it is clear enough... It is simply (to) "retrieve genotypic frequencies from a tped or vcf". Not genotype counts, but frequency numbers directly. I was just wondering if there was a method to retrieve it in the same format as when you retrieve the allele frequencies with vcftools or plink. That's all...

ADD REPLY
3
Entering edit mode
11.3 years ago
zx8754 12k

Following one-liner will convert Plink --hwe output from counts to frequencies:

#remove header, substitute "/" to "tabs", calculate frequencies, output to new file
sed 1d myfile.hwe | \
sed 's:/:\t:g' | \
awk '{OFS="\t";print $1,$2,$3,$4,$5,($6/($6+$7+$8))"/"($7/($6+$7+$8))"/"($8/($6+$7+$8)),$9,$10,$11}' \
> myfile.hwe.freq

Example:

#input
   CHR         SNP     TEST   A1   A2                 GENO   O(HET)   E(HET)            P
  22   rs2027653      ALL    C    T        489/1585/1498   0.4437   0.4601       0.0349
  22   rs2027653      AFF    C    T          241/772/752   0.4374   0.4581      0.06132
  22   rs2027653    UNAFF    C    T          248/813/746   0.4499    0.462        0.263
#output
22      rs2027653       ALL     C       T       0.136898/0.443729/0.419373     0.4437   0.4601  0.0349
22      rs2027653       AFF     C       T       0.136544/0.437394/0.426062     0.4374   0.4581  0.06132
22      rs2027653       UNAFF   C       T       0.137244/0.449917/0.412839     0.4499   0.462   0.263
ADD COMMENT
0
Entering edit mode

Nice one! I had already written some small code in Python to do it, but this is cleaner. I guess there is no direct way to retrieve it, then... Thanks!

ADD REPLY
0
Entering edit mode

Note that because of floating point conversions, the sum of 3 frequencies will not always give you 1 (i.e.: 100%).

ADD REPLY
0
Entering edit mode

Yes, I realized about it. But with Python it does.

ADD REPLY
1
Entering edit mode
11.3 years ago
Adam ★ 1.0k

If I understand your question correctly, the information you're looking for is contained in the output of --hardy in vcftools.

ADD COMMENT
0
Entering edit mode

Yes, but its barely the same as I did with plink. I was asking for a way to retrieve the frequency numbers and the genotypes directly, as vcftools does with the allelic frequencies. Thanks anyway! :)

ADD REPLY

Login before adding your answer.

Traffic: 1754 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6