Species level taxID
0
0
Entering edit mode
3.5 years ago
danvoronov ▴ 30

Hi,

I am trying to extract the sequences from the NCBI nt blast database. For this the blastdbcmd needs the species level taxIDs and it tells me to use get_species_taxids.sh. So that's what I did. Since I wanted all the bacterial sequences I used 2 as the highest taxID and thought that the following command would give me only bacterial species IDs

get_species_taxids.sh -t 2 > bactarial_taxIDs.txt

However, I have also order and genus IDs along with the species IDs so the following command does not work.

blastdbcmd -db ncbi_nt_db -dbtype nucl -taxidlist bacterial_taxIDs.txt -out bacteria.fa -outfmt "%f"

Error: [blastdbcmd] Taxonomy ID(s) not found. This could be because the ID(s) provided are not at or below the species level. Please use get_species_taxids.sh to get taxids for nodes higher than species (see https://www.ncbi.nlm.nih.gov/books/NBK546209/).

Is there a way to keep only species level IDs? Thanks.

Danil

blast taxIDs edirect ncbi • 1.7k views
ADD COMMENT
1
Entering edit mode

get_species_taxids.sh to get taxids for nodes higher than species

So that script is not useful for your purposes then.

Is there a way to keep only species level IDs?

ADD REPLY
0
Entering edit mode

So "higher" means genus and so on in this case?

ADD REPLY
0
Entering edit mode

I grabbed a random set of taxID from get_species command and was able to run the blastdbcmd.

$ more bact.txt 
1206589
1206590
1206591
1206592
1206642
1206643
1206644
1206645
1206646
1206647
1206648

 $ blastdbcmd -db nt -outfmt %f -taxidlist bact.txt | grep ">" 
>JX050253.1 Xanthomonas sp. 6 16S ribosomal RNA gene, partial sequence
>JX050254.1 Pseudomonas sp. 25 16S ribosomal RNA gene, partial sequence
>JX050261.1 Pseudomonas sp. 15 16S ribosomal RNA gene, partial sequence
>JX273662.1 Actinophytocola sp. I10A-01801 16S ribosomal RNA gene, partial sequence
>JX273665.1 Friedmanniella sp. I10A-01803 16S ribosomal RNA gene, partial sequence >JX273666.1 Friedmanniella sp. I10A-01996 16S ribosomal RNA gene, partial sequence
>JX273667.1 Frigoribacterium sp. I10A-01966 16S ribosomal RNA gene, partial sequence
>JX273670.1 Herbiconiux sp. I10A-01569 16S ribosomal RNA gene, partial sequence
>JX273678.1 Algoriphagus sp. I10B-02282 16S ribosomal RNA gene, partial sequence
>JX273680.1 Friedmanniella sp. I10A-01568 16S ribosomal RNA gene, partial sequence
>JX273681.1 Asanoa sp. I10A-01877 16S ribosomal RNA gene, partial sequence
>JX273682.1 Flavobacterium sp. I10A-02067 16S ribosomal RNA gene, partial sequence
>HE962099.1 Enterococcus sp. BGTRS1-35 partial 16S rRNA gene, strain BGTRS1-35

Do you have the taxonomy files available in the same directory where you have your nt index files?

ADD REPLY

Login before adding your answer.

Traffic: 1633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6