Hi everybody,
I have a strange problem, which for sure has something to do with the version of some reference that I'm using.
I'm counting the number of exons of the transcript NM_003752 in the following way:
from Bio import Entrez
Entrez.email = 'xxxxxx'
handle = Entrez.efetch(db='Nucleotide', id='NM_003752', retmode='xml')
record = Entrez.read(handle)
feature_count = len(record[0]['GBSeq_feature-table'])
exoncount = 0
for i in range(0, feature_count):
if record[0]['GBSeq_feature-table'][i]['GBFeature_key'] == 'exon':
exoncount += 1
print(exoncount)
Output is 21. Also, if I count manually on the NCBI website (https://www.ncbi.nlm.nih.gov/nuccore/557129041/), I count 21 exons.
However, if I run ANNOVAR using the following input file:
16 28391160 28391160 G A
with the following command
annotate_variation.pl -out results -build hg19 annovar.avinput humandb/ --separate
I get this result:
line1 nonsynonymous SNV EIF3CL:NM_001317857:exon21:c.C2704T:p.R902C,EIF3C:NM_001286478:exon22:c.C2671T:p.R891C,EIF3CL:NM_001099661:exon21:c.C2704T:p.R902C,EIF3C:NM_001267574:exon21:c.C2701T:p.R901C,EIF3C:NM_001037808:exon22:c.C2701T:p.R901C,EIF3CL:NM_001317856:exon21:c.C2704T:p.R902C,EIF3C:NM_001199142:exon22:c.C2701T:p.R901C,EIF3C:NM_003752:exon22:c.C2701T:p.R901C, 16 28391160 28391160 G A
So it says that for transcript NM_003752, the resulting mutation will be in exon 22, which it shouldn't even have.
What's the problem here?
Thanks very much for helping out!
Best, Gero
@gero: I removed your email address from the post.
Thank you, that was not very considerate of me.
gero.knittel : You should email ANNOVAR authors and let them know if this discrepancy.