Refseq Mrna To Cds Sequence
3
1
Entering edit mode
13.1 years ago
Woa ★ 2.9k

I've a long list of RefSeq mRNA Ids for a particular organism. I wish to download all the corresponding coding sequences(CDS) in fasta format, where available. Is their any suitable tool or script for automatically doing this?

Thanks in advance

WoA

refseq cds • 7.8k views
ADD COMMENT
0
Entering edit mode

Which organism?

ADD REPLY
0
Entering edit mode

Mouse(Mus Musculus)

ADD REPLY
4
Entering edit mode
13.1 years ago
  • Go to the table browser http://genome.ucsc.edu/cgi-bin/hgTables
  • select group "Gene", track "RefSeq", table "refGene"
  • click "identfiers: paste list" and copy+paste your list
  • output format: CDS fasta
  • get output
  • Formatting options: unselect everything but "Show nucleotides"
  • get output
ADD COMMENT
3
Entering edit mode
13.1 years ago
Neilfws 49k

Normally I would suggest BioMart for this purpose (assuming that your organism is in BioMart) but as I write, it is giving an error. However, here's the procedure for when they fix it:

  1. Select MARTVIEW in the top menu
  2. Choose database Ensembl genes 64, select dataset for your organism
  3. Click Filters, left menu; expand "Gene"; check "ID list limit"; select "Refseq mRNA IDs"
  4. Paste or upload IDs
  5. Click Attributes, left menu; select "Sequences"; expand "SEQUENCES"; select "Coding sequence"
  6. Click "Results", top-left menu.

Currently, this gives the error "Serious Error: Error during query execution: Table 'ensembl_mart_64.ox_RefSeq_mRNA__dm' doesn't exist" - I will report this to BioMart.

ADD COMMENT
0
Entering edit mode

Message from Ensembl: "This is a known bug in BioMart for release 64. See the known bugs page here: http://www.ensembl.info/contact-us/known-bugs/. This bug will be fixed for release 65 due out in November."

ADD REPLY
1
Entering edit mode
13.1 years ago
brentp 24k

If you're willing to try an in-development library, you can try cruzdb. With a script like this:

from cruzdb import Genome
db = Genome('hg19')

refGene = db.refGene

for name in (n.strip() for n in open("names.txt")):
    gene = refGene.filter_by(name=name).one()
    print ">%s" % name
    print "".join(gene.cds_sequence)

and names.txt containing id's like: NM_001127388 NM_001127389

It will create print FASTA file by querying the UCSC genomes database (refGene table), and grabbing sequence from their DAS sequence server.

If you have a long list, see the notes on the cruzdb page about mirroring the MySQL pages locally.

ADD COMMENT
0
Entering edit mode

Thanks I'll try that

ADD REPLY

Login before adding your answer.

Traffic: 1780 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6