Hello Everyone,
I am fairly novice Bioinformatician. I need some help and suggestions on tools that I can use to subset my annotated vcf file using specific criteria. The criteria are: (i) Coding and Splice site variants (ii) CADD > 10 if nonsynonymous SNPs (iii) AA change: Nonsense (iv) Absent in Exac database (v) Frequency is KAVIAR: 6.4E -06. I am working on the python code because I couldn't find any tool that serves my need. So far I have tried GATK's varianttotable, variantfiltration, bcftool, vcftool. I would like to know if there are any tools or tool out there which can parse the INFO column of vcf file and help to filter/subset the file based on selected criteria. Thank you in advance for your help!
Hi Manuel, Thank you for your response. I tried using tabular.txt to filter, but it is missing my Sample IDs that are present in the corresponding VCF file, so it is not very helpful. The VCF file I have is around 95 GB and it has 1048 samples. Is it normal for tabular.txt to not have Sample IDs?
I've typically only used annovar with single sample VCFs, but it looks possible if your VCF file is version 4.0, using
-format vcf4
and-allsample
: http://annovar.openbioinformatics.org/en/latest/misc/faq/