Hi,
While I use annovar to annotate the exome seq data, I get two files annovar.variant and annovar.exonic_variant.
My question is If exome seq only focus in exonic region, why the annovar also get the info in other regions: intronic, intergenic, etc?
thanks.
This paper also has a comparison of WES vs WGS, for those who find such things interesting...
The annovar.variant result have 144k rows, but the annovar.exonic-variant result only have about 15k rows. So that's my doubts there.
Thanks. I will look up the papers.
If you haven't done so already, I would use something like Picard to calculate your target region stats:
http://picard.sourceforge.net/picard-metric-definitions.shtml#HsMetrics
My guess is that you'll probably see a fair amount of off-target reads, especially if you remove duplicates (for example, I think you are doing pretty good of you get 60% on-target unique reads).
It is also worth taking into consideration your total number of reads. Let's say your on-target coverage is 80x versus 40x and your off-target coverage is 10x vs. 5x, respectively. Doubling the coverage probably results in the same on-target variants, but you will have much higher power to detect off-target variants.