How to search NCBI by taxa
1
0
Entering edit mode
3.1 years ago

I'm currently trying to get NCBI to return all virus taxa at a specific rank with a single search query.

Currently, my query:

Genus[Rank] AND virus

in the Taxonomy database is returning few results, where I would expect around 120 hits corresponding to the rough number of known virus genera.

ncbi taxonomy • 1.2k views
ADD COMMENT
0
Entering edit mode

As far as I see there are no Genus level classification for viruses. For example a query like this using EntrezDirect:

$ esearch -db taxonomy -query "viruses [orgn]" | esummary

give us the following (one example)

<DocumentSummary><Id>2858412</Id>
    <Status>active</Status>
    <Rank>species</Rank>
    <Division>viruses</Division>
    <ScientificName>Blueberry virus T</ScientificName>
    <CommonName></CommonName>
    <TaxId>2858412</TaxId>
    <AkaTaxId>0</AkaTaxId>
    <Genus></Genus>
    <Species></Species>
    <Subsp></Subsp>
    <ModificationDate>2021/07/27 00:00</ModificationDate>
    <GenBankDivision>Viruses</GenBankDivision>
</DocumentSummary>

So the Rank is set to species for all records in taxonomy db. So you can extract the information above to get

$ esearch -db taxonomy -query "viruses [orgn]" | esummary | xtract -pattern DocumentSummary -element Rank,Division,ScientificName

species viruses Vibrio phage 11E33.1
species viruses Erwinia phage pEa_SNUABM_54
species viruses Erwinia phage pEa_SNUABM_49
species viruses Erwinia phage pEa_SNUABM_48
ADD REPLY
0
Entering edit mode

The complete lineage from Taxonomy browser shows the genus rank is Tepovirus. esummary's bug?

$ esearch -db taxonomy -query "viruses [orgn] Tepovirus" | esummary | xtract -pattern DocumentSummary -element Rank,Division,ScientificName
genus   viruses Tepovirus
ADD REPLY
0
Entering edit mode
3.1 years ago

You can use a local tool taxonkit to list all virus taxIds, filter these at genus rank, and print lineage information:

$ time taxonkit list --ids 10239 \
    | taxonkit filter --equal-to genus \
    | taxonkit lineage --show-rank --show-name \
    > virus.genus.tsv

real    0m1.842s
user    0m8.335s
sys     0m0.619s

$ head -n 5 virus.genus.tsv 
10473   Viruses;Plasmaviridae;Plasmavirus       Plasmavirus     genus
10475   Viruses;Fuselloviridae;Alphafusellovirus        Alphafusellovirus       genus
1299307 Viruses;Fuselloviridae;Betafusellovirus Betafusellovirus        genus
10483   Viruses;Polydnaviridae;Ichnovirus       Ichnovirus      genus
10485   Viruses;Polydnaviridae;Bracovirus       Bracovirus      genus

$ wc -l virus.genus.tsv 
2193 virus.genus.tsv

So, there are 2193 viral genera ( taxdump 2021-10-01).

The results may be different, because the NCBI Taxonomy changes every day.

ADD COMMENT

Login before adding your answer.

Traffic: 2727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6