Dear All,
I would like to address certain queries of mine. I have been using Annovar for a while now to annotate my variants from both vcf and text based format which are the output of GATK and VarScan. Off late I noticed that upno annotating I find more than 60% of the variants which are SNVs are on the intergenic and intronic regions and only few are in the exonic regions. Although my data has 75% of exon coverage. I have earlier tried to check with the how much of my reads are on the exonic regions with the Sure Select bed file that is used for the target enrichment and there I found it is well over 75% but then the variants which I find after all the statistical tests and removing the false positives I should be expecting nearly 70% lie on the exonic region after annotation right? For all my samples I have variants ranging from 200-220. These are somatic variants and out of them roughly 30% lie on the exonic region rest are in the genome_sumary.csv file of the annovar output. Is this a likely scenario or am missing out on something. Is it advisable to use some other annotation tool, if so which tool can be used directly to annotate the vcf format and text based variant format file. I would like to have some suggestions here.
Thanks
Did you provide the target file when you called the variants with GATK and VarScan ?
Yes certainly I did use the target bed file while following the GATK pipeline, during the steps before BQSR. Infact what am doing is , am creating the realigned recalibrated bam file from GATK and then using that for variant calling for both GATK and VarScan and on this bam file already the target enrichment file is used. I have no idea how to use target enrichment file with VarScan. Is there any way to do that in VarScan as well?