Entering edit mode
6.8 years ago
ahmad mousavi
▴
800
Hello
Based on GATK4 best practices pipline I have made a VCF file composed 4 person WES data. When I want to annotate it with annovar , but annovar could not annotate all variations and near 70% of variations discard and gone to Invalid_input.
I though it might happen due to VCF version (4.2), but it doesn't work with annovar default input format (avinput).
What is your suggestion for annotating GATK4 output VCFs?
Please paste some of the variants that are stored in invalid_input
A better approach may be to first split your variants into 4 different avinput files, and then annotate these:
thanks for reply. The problems still exist. I have 5 sample ( output from Haplotypecaller which merged by Bcftools), but after running you command each file is separated and it is hard to track a SNP in all 5 samples.
Can I annotate all 5 samples individually and then merge annotated result into one file?
It gives following information, I lost almost 50,000 SNPs and indels
Without seeing your VCF, I cannot understand entirely what is going on. The only filter that could result in substantial loss of variants is the
--snpqual <float>
filter passed to convert2annovar.pl, with an initial value set to 20.Other possibilities to consider:
You could indeed process each sample independently with the
--allsample
flag, and then merge it all back together. You should keep rack of wich variants were called in which individual, though. When sample number is low, I believe that doing it this way is fine.