Hello!
First of all, the overview of bioinformatics NGS analysis for exome amplicon sequencing (only 50 genes), see the following point.
- mapping, recaliberation and GATK haplotyper calling
- For variant annotation : I used Annovar, VEP, SNPEff and Vtools
- combined the required annotations in one file (CSV file format), idea is to look for complete annotation including kegg, GO, KAVIAR, clinva , 1000g2015aug, refGene , thousandGenomes, LOF in order to perform knowledge-based functional filtration.
I am very confused about the output, in particular, to understand the discrepancy between the allelic frequency from KAVIAR, thousand Genomes, EUR_MAF. for example, one mutation suggests Kaviar_AF=0.0001153, thousandGenomes_AF_INFO=0.69, EUR_MAF=G:0.9791. How shall we decide the pick of the database and annotations? As I understand that there is no benchmark method to use for annotation but there could be the criteria to make the choice or some statistical method to base our decision on for annotation and filtration.
Is there some discussion regarding the discrepancies found in a different database, and suggested criteria for filtering annotation?
Thanks in advance for any suggestion.
Variant frequencies are population specific, certainly if the variant is rare.