Download Older Versions Of Refseq?
3
1
Entering edit mode
11.3 years ago
Hmm ▴ 500

I am trying to bulk download or programmatically access the FASTA file for the following refseq IDs (which are from the older Refseq version):

NM_005656.1
NM_016211
NM_015668.3
NM_014584.1
NM_001134673.3
NM_001040454
NM_138799.2
NM_002880.3
NM_001098811.1
NM_014447.2
NM_033547.3
NM_032360.3
NM_015103.2
NM_016185.2
NM_001042492.2
NM_018361.3
NM_005930.3
NM_022486.3
NM_013234.2
NM_000565.2
NM_006328.3
NM_024963.4
NM_032852.2
NM_001500.2
NM_003057.2
NM_005422.2
NM_023110.2
NM_033389.2
NM_001033604.1
NM_004822.2
NM_001007267.1
NM_002530
NM_007371.3
NM_000142
NM_004956.3
NM_004449.3
AY204740.1
NM_003176.2
NM_001134999.1
NM_005400.2
NM_001351.2
NM_014423.3
NM_002223.2
NM_033393.2
NM_016052.3
NM_001017395.1
NM_173477.2
NM_001094.4
NM_024596.2
NM_003616.2
NM_005156.5
NM_016593.3
NM_020452.3
NM_018026.2
NM_030793.3
NM_003719.2
NM_001014433.2
NM_001130047.1
NM_025069.1
NM_015355.2
NM_138295.2
NM_016836.2

I have been to this site but no help ftp://ftp.ncbi.nlm.nih.gov/refseq/release/release-catalog/archive/ Any comments are really appreciated.

refseq fasta • 6.2k views
ADD COMMENT
2
Entering edit mode
11.3 years ago
Hamish ★ 3.3k

Spot checking a few of the RefSeq (nucleotide) accessions in your list, most of these appear in the current RefSeq release and thus are available from the standard RefSeq sources, for example:

Since you are only interested in the sequence, and the identifiers you are using are specific to the version of the sequence in RefSeq, you could fetch the sequences for the entries where these versions are available in the current RefSeq release. You will then have a set of sequences for the ones you could find, and a list of identifiers which are going to require a specific look-up since those sequences are not in the current release.

Unfortunately historic entries are not available via E-utilities ESearch (might be a good idea to let NCBI know that you would find that useful), so the remaining sequences have to be obtained using alternative methods. It turns out that the old entries are available via the NCBI Entrez web interface. For example NM_005656.1 has been obsoleted, but is still available in the web interface:

http://www.ncbi.nlm.nih.gov/nuccore/NM_005656.1

Please note that scripting the NCBI web interface is a good way to get blacklisted by NCBI services. So you'll want to do this bit manually...

However only one old entry can be retrieved at a time, so it is not possible to perform a query with all the old accessions to get the whole set in one go. Thus for each of your obsolete accessions look-up the entry in Entrez, selecting 'fasta' as the desired format. You might find this convenient to do by generating URLs like:

http://www.ncbi.nlm.nih.gov/nuccore/NM_005656.1?report=fasta

Then use the "Send-to" menu to save the sequence as fasta to a file.

Once you have all of the additional sequences, you can concatenate them together with the sequences from the current records to give the whole set.

Given that only a few of the accessions in your list appear to be obsolete, this should not be too big a problem.

Update: as Hmm notes old entries are available via E-utilities EFetch (they are not available in ESearch, so you cannot use Entrez queries to access them).

ADD COMMENT
0
Entering edit mode

@Hamish: Thanks. Although it is in the current version but when i access an id such as NM_005656.1 it displays a message: "This sequence has been updated. See current version." The issue is i downloaded the whole mrna_refseq latest version and could not locate the id NM_005656.1 in it.

ADD REPLY
0
Entering edit mode

Maybe I have not been clear about the difference between entry accessions (e.g. NM_005656) and sequence versions (e.g. NM_005656.1). Due to the nature of the databases the sequences associated with the entries are sometime updated, say to correct sequencing errors, or to incorporate information about sequence variation and the most common form of the sequence. In order to allow the specification of a specific sequence, instead of the sequence currently associated with the entry, a version number suffix is used to indicate the revision of the sequence.

So in this case while accession NM_005656 is in the current version of RefSeq, the current sequence version is NM_005656.3. The sequence version NM_005656.1 refers to an older version of the sequence. Thus when looking this up at NCBI you get a message telling you this is not the current sequence for the entry, and a pointer to the current entry with the current sequence. You might find the revision history view for NM_005656 helpful to understand what is happening, and to see how this entry has changed over time.

Please note that the use of the sequence version suffix is commonly used across the various sequence databases. However in UniProtKB there is a slight complication due to the use of the same syntax to handle sequence versions and entry versions. Since the entry is updated more often than the sequence this means that when dealing with UniProtKB versioned accessions you need to know which type they are before attempting to look them up. For example:

  • Revision history from UniSave: P06213
  • Entry version P06213.3 from Swiss-Prot 13.0 (01-JAN-1990)
  • Sequence version P06213.3 from UniProtKB 2010_09 (10-AUG-2010)
ADD REPLY
0
Entering edit mode
11.3 years ago
Ian 6.1k

You could try using Biomart via Ensembl. I am not sure what you want to extract, but by using the Ensembl archives you could enter the Refseq IDs as a gene filter.

ADD COMMENT
0
0
Entering edit mode

E-utilities can see obsolete identifiers in EFetch but not in ESearch. Which means that using Entrez queries for obsolete identifiers fails.

Retrieving Ensembl entries from ENA does not work, since the identifiers are not part of their database:

$ wget -q -O - "http://www.ebi.ac.uk/ena/data/view/ENST00000394480&display=fasta"
Entry: ENST00000394480 not found.

However you can get them from dbfetch/WSDbfetch:

wget -q -O - "http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/ensembltranscript/ENST00000394480/fasta"

Or from:

ADD REPLY

Login before adding your answer.

Traffic: 2487 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6