Entering edit mode
11.4 years ago
alexhli
▴
90
I would like to look up the total number of exons in a transcript, starting with the NM# for this cDNA. For example, say I have a list of thousands of mutations which are formatted like this:
PRAMEF1:NM_023013:exon3:c.T314A:p.L105X
How can I tell if this mutation occurs in the last exon of the transcript? Or at least the first half of the total transcript? This issue is specifically related to nonsense-mediated decay.
Thanks
You could also use BioMart, with RefSeq mRNA ID as the filter and Ensembl Exon ID as attribute. Returns 4 exons IDs for the example in the question.
The reason I suggested Perl as if all the data is in the format shown above, then they're going to have to extract the RefSeq ID in some way anyway, and you can easily do it with a split function in Perl. Once you've started doing it that way, you may as well continue.
The other reason I suggested the API is the numbers. BioMart can sometimes be a bit difficult with large numbers of items.