Get Ncrna Genes In Fasta Format From Img Database
1
1
Entering edit mode
13.1 years ago
pmenzel ▴ 310

Hi all,

I would like to retrieve annotated genes from the Integrated Microbial Genomes (IMG) Database, but this web interface and the "cart" is haunting me. I know from the FTP site I can download sets of fasta files for specific organisms once I know their ID, e.g. ftp://ftp.jgi-psf.org/pub/IMG/img_w_v340/648028003.tar.gz

In this file, I find a fasta file with all annotated genes which mixes protein and RNA genes.

But I want only the RNA genes, which I can list on the website with: http://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=TaxonDetail&page=rnas&taxon_oid=648028003

But then there is no way to download them in in one fasta file, I can only add them to the cart and there see them in fasta style inside the browser.

Does anybody know if there is a better API for this database? What did I miss here?

cheers, Peter

bacteria • 3.3k views
ADD COMMENT
2
Entering edit mode
13.1 years ago

I think the easiest way to automate this would be to download the tab delimited feature file that contains the features that you interested in as well as the fasta file that contains all sequences.

Transform the feature file to BED format, filter it to only keep the RNA genes, the you can use use the fastaFromBed utility from the bedtools package with the fasta file.

If you have problems with either of these steps post a new question I am sure we can provide you with a quick script (the above will work on Mac or Linux, or if you are on Windows install Cygwin)

ADD COMMENT

Login before adding your answer.

Traffic: 2131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6