Hi Biostars,
I'm using ANNOVAR to annotate some WGS data. I want to pare down the list until I am left with variants of very low frequency. I've got ANNOVAR working, but I'd appreciate your help interpreting its output. Here's what I did.
To split out the rare variants, I issued this command (or rather, a version of it that works on my cluster).
perl annotate_variation.pl --filter --buildver hg19 --maf_threshold 0.0001 --dbtype 1000g2014oct_all --outfile maf-1e4 --comment my_variants.avinput <path_to_humandb>
I got out 5.4 million common variants in the file maf-1e4.hg19_ALL.sites.2014_10_dropped
and another 1.1 million rare variants in maf-1e4.hg19_ALL.sites.2014_10_filtered
. Great -- except that there was no column giving allele frequencies for the rare variants! There was one for the common variants, though. I figured maybe only dropped variants get annotated (though that would be weird and annoying), and I tried this here hack to get ANNOVAR to drop my rare variants:
cp maf-1e4.hg19_ALL.sites.2014_10_filtered rare_variants.avinput
perl annotate_variation.pl --filter --buildver hg19 --maf_threshold 0.0000000001 --dbtype 1000g2014oct_all --outfile maf-1e10 --comment rare_variants.avinput <path_to_humandb>
The result: all of my variants passed the filter!
wc -l maf-1e10.*
0 maf-1e10.hg19_ALL.sites.2014_10_dropped
1114818 maf-1e10.hg19_ALL.sites.2014_10_filtered
So there are no variants rarer than 1 copy in 10,000 but loads rarer than one copy on Earth? Unbelievable! Here's what I think actually happened: 1000 Genomes, having on the order of 1000 genomes to work with, cannot tell the difference between 0.01% and one copy per multiverse. So, I should interpret my 1.1 million variants as all being rare enough that they do not appear in 1000 Genomes. Is that right?
Thanks!