Annotation using annovar
1
0
Entering edit mode
10.3 years ago
ivivek_ngs ★ 5.2k

Dear All,

I would like to address certain queries of mine. I have been using Annovar for a while now to annotate my variants from both vcf and text based format which are the output of GATK and VarScan. Off late I noticed that upno annotating I find more than 60% of the variants which are SNVs are on the intergenic and intronic regions and only few are in the exonic regions. Although my data has 75% of exon coverage. I have earlier tried to check with the how much of my reads are on the exonic regions with the Sure Select bed file that is used for the target enrichment and there I found it is well over 75% but then the variants which I find after all the statistical tests and removing the false positives I should be expecting nearly 70% lie on the exonic region after annotation right? For all my samples I have variants ranging from 200-220. These are somatic variants and out of them roughly 30% lie on the exonic region rest are in the genome_sumary.csv file of the annovar output. Is this a likely scenario or am missing out on something. Is it advisable to use some other annotation tool, if so which tool can be used directly to annotate the vcf format and text based variant format file. I would like to have some suggestions here.

Thanks

annovar SNP • 4.5k views
ADD COMMENT
0
Entering edit mode

Did you provide the target file when you called the variants with GATK and VarScan ?

ADD REPLY
0
Entering edit mode

Yes certainly I did use the target bed file while following the GATK pipeline, during the steps before BQSR. Infact what am doing is , am creating the realigned recalibrated bam file from GATK and then using that for variant calling for both GATK and VarScan and on this bam file already the target enrichment file is used. I have no idea how to use target enrichment file with VarScan. Is there any way to do that in VarScan as well?

ADD REPLY
0
Entering edit mode
10.3 years ago
arno.guille ▴ 420

You said "I have earlier tried to check with the how much of my reads are on the exonic regions with the Sure Select bed file that is used for the target enrichment and there I found it is well over 75% but then the variants which I find after all the statistical tests and removing the false positives I should be expecting nearly 70% lie on the exonic region after annotation right"

But this assumption is false, there are much more mutations in intronic regions because of selection pressure.

ADD COMMENT
0
Entering edit mode

So this is likely scenario that due to the selection the mutations would likely be more on the intronic regions, I am saying that the reads for the exome mostly corresponded to the exonic regions but then when you retrieve the variants out of them and try to reduce the variants to the more significant one and then annotate I find more in the intronic and the intergenic regions rather than the exonic regions. But should not be that the variation should be much more in the exonic regions. I am not denying the fact of the selection but then if its exome data then should not I have a fair amount of mutations on the exonic regions?

ADD REPLY
0
Entering edit mode

Did you provide the target file when you called the variants with GATK and VarScan?

ADD REPLY
0
Entering edit mode

I also find a lot of significant hits in introns/mirna/intergenic regions, in the exome data I am analyzing.These could be interesting, but finding functional consequences of these are often very challenging, and people normally do not pursue such SNV further, unless you have a story in mind.

arno.guille: BTW, I had no idea about the functionality about providing the target file when calling cariants with varscan. Which parameter did you use?

ADD REPLY
0
Entering edit mode

Infact am unware of it as well as to how to use the target bed file while calling variants using VarScan

ADD REPLY
0
Entering edit mode

I don't use varscan but i thought there was a such option. It seems not to be the case.

ADD REPLY
0
Entering edit mode

No there is no such option in VarScan as far as I am concerned, only in case of the variant calling in GATK in BQSR you can use the target bed file and then during the Unified Genotype call or the Haplotype Calling you can do the same.

ADD REPLY

Login before adding your answer.

Traffic: 1904 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6