I have 10000+ hg19 RefSeq IDs (with version numbers) that were generated last year and I would like to query RefSeq for all the annotations associated with them (CDS, exons, UTRs, etc.). However, because many of the IDs have been updated, simply getting a table from UCSC Table Browser is always missing a good chunk.
I have tried using the archived tables from NCBI here, but not all of our IDs exist in those tables. Is there not a way to get these annotations through a database query like Entrez E-Utilities but for RefSeq?
EXAMPLE IDs that I can't find info for (there are MANY more, just giving some examples):
NM_000017.3
NM_000018.3
NM_000019.3
NM_000020.2
Thank you! I figured something like this existed, but could not seem to find it easily.
There's a slight wrinkle in the solution though: if a newer version of the transcript exists, it will return that data, instead of the older one:
Any idea how to force the query even if it's out-of-date?
Hi! NCBI datasets does not have that option. I'm pasting here three NCBI FTP links that might have the info/files you need:
I hope it helps you find what you need. From the list of accessions you posted, I found NM_000017.3 in the 2017 release and NM_000018.3, NM_000019.3, NM_000020.2 in the 2014 and 2017 releases.
If you still need any help retrieving the files, please post here again or feel free to reach our to the NCBI Helpdesk at info@ncbi.nlm.nih.gov and add NCBI Datasets to the subject line.