Parsing the NR database
2
0
Entering edit mode
7.5 years ago

Hello Everyone. I wanted to parse all Animal sequences from NR database. Can anyone suggest me an easy go method of doing the same. ?

NGS NRdatabase BLAST • 2.7k views
ADD COMMENT
3
Entering edit mode
7.5 years ago

The BBMap package has a tool called "filterbytaxa" which will accomplish this. However, NCBI unfortunately never labels sequences with their taxID, which makes everything a little more difficult.

Following Manu's suggestion for using Metazoa, the usage would be like this:

filterbytaxa.sh in=nr.faa out=metazoa.faa ids=33208 include=t tree=tree.taxtree.gz gi=gitable.int1d.gz accession=prot.accession2taxid.gz,pdb.accession2taxid.gz,dead_prot.accession2taxid.gz

But you first need to get the accession files, taxonomic tree, and potentially gi tables like this:

wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/*.gz
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdmp.zip
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_prot.dmp.gz
unzip taxdmp.zip
taxtree.sh names.dmp nodes.dmp tree.taxtree.gz -Xmx16g
gitable.sh gi_taxid_nucl.dmp.gz,gi_taxid_prot.dmp.gz gitable.int1d.gz -Xmx16g

filterbytaxa.sh, taxtree.sh, and gitable.sh are part of the BBMap package. wget and unzip are part of most Linux builds. It's easiest if you put all the BBMap shell scripts in the path before running this. If you have the most recent copy of nr, you shouldn't need the gi numbers.

ADD COMMENT
1
Entering edit mode
7.5 years ago
GenoMax 147k

"Animals" is a rather vague definition. You may want to narrow it to say "vertebrates" etc. While the specific commands have changed some with the blast+ package you should get an idea of how to go about doing this following this post: Vertebrate Subset Nr Database? Build My Own?

Are you eventually looking to build a blast database or just need the sequence data?

ADD COMMENT
1
Entering edit mode

Metazoa (Taxonomy ID: 33208) could be a good candidate for "animals".

ADD REPLY

Login before adding your answer.

Traffic: 1694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6