How to get annotations for an old RefSeq transcript ID
2
0
Entering edit mode
2.9 years ago
adixon3 • 0

I have 10000+ hg19 RefSeq IDs (with version numbers) that were generated last year and I would like to query RefSeq for all the annotations associated with them (CDS, exons, UTRs, etc.). However, because many of the IDs have been updated, simply getting a table from UCSC Table Browser is always missing a good chunk.

I have tried using the archived tables from NCBI here, but not all of our IDs exist in those tables. Is there not a way to get these annotations through a database query like Entrez E-Utilities but for RefSeq?

EXAMPLE IDs that I can't find info for (there are MANY more, just giving some examples):

NM_000017.3

NM_000018.3

NM_000019.3

NM_000020.2

transcript refseq • 1.7k views
ADD COMMENT
1
Entering edit mode
2.9 years ago
MirianT_NCBI ▴ 760

Hi, You can use the NCBI Datasets command line tool. I checked the accessions you posted and they are all available. You can either enter each accession individually or as a list. Like this:

One accession at a time:

datasets download gene accession NM_000017.3

Or a list of accessions. Let's assume your file is called accessions.txt, with one accession per line:

datasets download gene accession --inputfile accessions.txt

If you use the list of accessions, all sequences will be downloaded into a single fasta file. If that's ok for you, great. If not, you can loop over your list so each gene is downloaded as a separate data package (zip file) with the accession number as its name.

cat accessions.txt | while read GENE; do datasets download gene accession ${GENE} --filename ${GENE}.zip; done

I hope it helps. Let me know if you run into any issues.

EDIT: in case you want to look at the metadata instead of downloading the data per se, you can replace download by summary in the datasets command line.

datasets summary gene accession NM_000017.3

Datasets will produce a json output on the screen, that you can redirect to a file if you prefer.

ADD COMMENT
0
Entering edit mode

Thank you! I figured something like this existed, but could not seem to find it easily.

There's a slight wrinkle in the solution though: if a newer version of the transcript exists, it will return that data, instead of the older one:

    "warnings":[{
"gene_warning_code":"ACCESSION_VERSION_MISMATCH",
"message":"The current accession.version will be returned.",
"reason":"The accession.version you requested is no longer current or otherwise unrecognized.",
"replaced_id":{"requested":"NM_000017.3","returned":"NM_000017.4"}
}]

Any idea how to force the query even if it's out-of-date?

ADD REPLY
1
Entering edit mode

Hi! NCBI datasets does not have that option. I'm pasting here three NCBI FTP links that might have the info/files you need:

I hope it helps you find what you need. From the list of accessions you posted, I found NM_000017.3 in the 2017 release and NM_000018.3, NM_000019.3, NM_000020.2 in the 2014 and 2017 releases.

If you still need any help retrieving the files, please post here again or feel free to reach our to the NCBI Helpdesk at info@ncbi.nlm.nih.gov and add NCBI Datasets to the subject line.

ADD REPLY
1
Entering edit mode
2.9 years ago
GenoMax 147k

Using EntrezDirect (truncated to save space)

$ esearch -db nuccore -query NM_000017.3 | efetch -format ft
>Feature ref|NM_000017.3|
1   1964    gene
            gene    ACADS
            gene_syn    ACAD3
            gene_syn    SCAD
            gene_desc   acyl-CoA dehydrogenase short chain
            db_xref GeneID:35
            db_xref HGNC:HGNC:90
            db_xref MIM:606885
1   194 exon
            inference   alignment:Splign:2.1.0
ADD COMMENT
0
Entering edit mode

I spent hours looking for a solution before finding this post!

Though being late for the party, the way @adixon3 described the data he wants to obtain, I would suggest genbank format:

esearch -db nuccore -query NM_000017.3 | efetch -format gb

Just followed the installation instructions here and I was set in a few minutes: https://www.ncbi.nlm.nih.gov/books/NBK179288/

Big Thanks from my side.

ADD REPLY

Login before adding your answer.

Traffic: 2918 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6