Hi Guys,
I really need some help with understanding a discrepancy between different RefSeq descriptions at NCBI. I've been trying to extract some info about different transcripts (representing gene isoforms) for more than a week now, but they simply seem to be inaccessible.
If you scroll this the default Gene page at NCBI - i.e. : http://www.ncbi.nlm.nih.gov/gene/22061, at some point transcript specific data from RefSeq is displayed (it's under the "NCBI Reference Sequences (RefSeq)" header). If the gene has a few different isoforms (like p63) appropriate info about each of them is displayed. Similarly, if you click on the RefSeq record (i.e. NM_001127259.1 = http://www.ncbi.nlm.nih.gov/nuccore/NM_001127259.1 ) you can see a full description of the record WITH the comment about the transcript - i.e. "Transcript Variant: This variant (1) encodes the longest isoform (a, also known as TAp63alpha).".
Now, when I query the UCSF refSeqSummary database (SELECT * FROM refSeqSummary r WHERE r.mrnaAcc = "NM_001127259") I only get the first part of this description (without info about the transcript).
What database/table do I need to query to get the info I am missing? I do not know if that is important, but when I looked at the source code of the NCBI's RefSeq record, the tag containing the full comment has a class comment_189083803. So it seems that it is coming for a database of some kind...
I would be very grateful for any comments and hints.
Best wishes,
Adam
You can find all the transcripts but you cant select only the longest per gene. If there's a way to only get the longest in martview, that I havent seen, please do let us know how :)