Question

Interpreting ANNOVAR allele frequency output

0

Entering edit mode

8.8 years ago

eric.kern13 ▴ 240

Hi Biostars,

I'm using ANNOVAR to annotate some WGS data. I want to pare down the list until I am left with variants of very low frequency. I've got ANNOVAR working, but I'd appreciate your help interpreting its output. Here's what I did.

To split out the rare variants, I issued this command (or rather, a version of it that works on my cluster).

perl annotate_variation.pl --filter --buildver hg19 --maf_threshold 0.0001 --dbtype 1000g2014oct_all --outfile maf-1e4 --comment my_variants.avinput <path_to_humandb>

I got out 5.4 million common variants in the file maf-1e4.hg19_ALL.sites.2014_10_dropped and another 1.1 million rare variants in maf-1e4.hg19_ALL.sites.2014_10_filtered. Great -- except that there was no column giving allele frequencies for the rare variants! There was one for the common variants, though. I figured maybe only dropped variants get annotated (though that would be weird and annoying), and I tried this here hack to get ANNOVAR to drop my rare variants:

cp maf-1e4.hg19_ALL.sites.2014_10_filtered rare_variants.avinput

perl annotate_variation.pl --filter --buildver hg19 --maf_threshold 0.0000000001 --dbtype 1000g2014oct_all --outfile maf-1e10 --comment rare_variants.avinput <path_to_humandb>

The result: all of my variants passed the filter!

wc -l maf-1e10.*
   0 maf-1e10.hg19_ALL.sites.2014_10_dropped
   1114818 maf-1e10.hg19_ALL.sites.2014_10_filtered

So there are no variants rarer than 1 copy in 10,000 but loads rarer than one copy on Earth? Unbelievable! Here's what I think actually happened: 1000 Genomes, having on the order of 1000 genomes to work with, cannot tell the difference between 0.01% and one copy per multiverse. So, I should interpret my 1.1 million variants as all being rare enough that they do not appear in 1000 Genomes. Is that right?

Thanks!

sequencing genome • 4.2k views

ADD COMMENT • link 8.8 years ago by eric.kern13 ▴ 240

score 1 · Accepted Answer · 2016-10-06

Below, I quote a response from Kai Wang, ANNOVAR creator/maintainer:

you should just use table_annovar.pl to print out allele frequency for all variants in your input. The word "minor allele frequency" cannot be defined well, because rare allele in one population will be common allele in another population, and generally should NEVER be used in genetics, and because the reference genome does contain many sites that have REFERENCE allele being rare allele in any human populations.

Another note is that it looks like those in "filtered" file in your question are those that are not annotated in 1000G, so you will not have an allele frequency measure. Nothing unexpected here.