I have run Annovar on GATK output after inserting a row for the end locus (following Annovar prepare input file tutorial). The script I am using is: annotate_variation.pl -out gatk -build hg19 example/gatkfile humandb/ -dbtype knownGene
so that I can get UCSC transcript annotations.
For simple insertions/deletions I can pull out a protein sequence from hg_19knownPep
to see if the variant position information (for instance G952A) is correct. I wrote code to do this for all non-synonymous SNVs and the Annovar annotations are correct for all of them.
On the other hand, this is not the case when I look at the RNA level. For instance, take the Annovar entry:
frameshift substitution NBPF8:uc031pny.1:exon2:c.116_116delinsGAA, chr1 144615250 144615250 G GAA
When I get the RNA sequence from the Annovar HG19 reference file for uc031pny.1 I notice the following which is causing me confusion:
- G is at chr1:144615250 in IGV forward strand (check)
- But when I get the mRNA sequence from Annovar's KnownGeneMrna, the nucleotide at position 116 is T. This is pretty consistent for all the substitutions and deletions in the output. I think I'm misinterpreting something but I'm not sure what. Any help would be excellent.
Thanks,
Jeremy