Those look like Refseq transcript IDs. You can download the current version of Refseq here: Refseq vertebrate mammalian. The *.rna.gbff.gz files in this directory contain a GenBank record for each Refseq ID and should specify the latest version. You would just need to grab the 'ACCESSION' and 'VERSION' values for each record. For example:
ACCESSION XM_002714324
VERSION XM_002714324.1 GI:291395911
Another option would be to use the NCBI E-utilities. For example, use esearch to get the uid for each Refseq ID and use it again to get the Refseq ID with current latest version number.
The following returns an XML for 'NM_000014' (note that no version is specified here) containing the uid '66932946':
The following returns an XML for the uid '66932946':
This XML contains a line: gi|66932946|ref|NM_000014.4|[66932946]
Telling you that currently this Refseq transcript is on version 4. Of course, you would need a script to automate this process for the number of records that you have.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Can you explain more about the directory structure of the files in REfseq vertebrate mammalian? I saw 100 sets of files there. And it has around 6-10 files in each set. If I only need mRNA transcript of human, which groups of files should I d/l? FYI, I'm really new in biology but very strong background in com sci.
There are 6 data sets, each with a specific file format represented in the Refseq FTP directory. Each of these 6 data sets is divided into 144 blocks to avoid large blocks. This is sort of explained here: The six file types are: genomic.fna (genome data in fasta nucleic acid format), genomic.gbff (genome data in genbank flat file format), protein.faa (protein data as fasta amino acid), protein.gpff (protein data as genprot flat file), rna.fna (rna data as fasta nucleic acid), rna.gbff (rna as genbank flat file)
If you go the Refseq FTP route, it might be more convenient to work with the human specific files here: