How to choose a variant filtering criteria to reduce false positives
1
3
Entering edit mode
8.8 years ago
nikkihathi ▴ 30

Hello!

First of all, the overview of bioinformatics NGS analysis for exome amplicon sequencing (only 50 genes), see the following point.

  • mapping, recaliberation and GATK haplotyper calling
  • For variant annotation : I used Annovar, VEP, SNPEff and Vtools
  • combined the required annotations in one file (CSV file format), idea is to look for complete annotation including kegg, GO, KAVIAR, clinva , 1000g2015aug, refGene , thousandGenomes, LOF in order to perform knowledge-based functional filtration.

I am very confused about the output, in particular, to understand the discrepancy between the allelic frequency from KAVIAR, thousand Genomes, EUR_MAF. for example, one mutation suggests Kaviar_AF=0.0001153, thousandGenomes_AF_INFO=0.69, EUR_MAF=G:0.9791. How shall we decide the pick of the database and annotations? As I understand that there is no benchmark method to use for annotation but there could be the criteria to make the choice or some statistical method to base our decision on for annotation and filtration.

Is there some discussion regarding the discrepancies found in a different database, and suggested criteria for filtering annotation?

Thanks in advance for any suggestion.

next-gen variant annotation • 2.2k views
ADD COMMENT
0
Entering edit mode

Variant frequencies are population specific, certainly if the variant is rare.

ADD REPLY
2
Entering edit mode
8.6 years ago
chen ★ 2.5k

Good question.

I usually use confidence + importance to filter variants

You were using GATK, so you were doing germline variant calling, right? That's relative easy and stable.

For somatic mutation calling, it is more tricky.

ADD COMMENT

Login before adding your answer.

Traffic: 1071 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6