I have a long list of species names.
I was wondering whether a tool was available online, which could fetch the NCBI IDs?
I have a long list of species names.
I was wondering whether a tool was available online, which could fetch the NCBI IDs?
try taxonkit, providing single binary files for Windows/Linux. It's really fast.
$ cat names.txt
Homo sapiens
Akkermansia muciniphila ATCC BAA-835
Akkermansia muciniphila
Mouse Intracisternal A-particle
Wei Shen
uncultured murine large bowel bacterium BAC 54B
Croceibacter phage P2559Y
$ time taxonkit name2taxid names.txt
[INFO] parsing names file: /home/shenwei/.taxonkit/names.dmp
[INFO] 1587755 names parsed
Homo sapiens 9606
Akkermansia muciniphila ATCC BAA-835 349741
Akkermansia muciniphila 239935
Mouse Intracisternal A-particle 11932
Wei Shen
uncultured murine large bowel bacterium BAC 54B 314101
Croceibacter phage P2559Y 1327037
real 0m7.217s
user 0m10.682s
sys 0m0.471s
$ time taxonkit name2taxid names.txt | taxonkit lineage -i 2
Homo sapiens 9606 cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;Homo sapiens
Akkermansia muciniphila ATCC BAA-835 349741 cellular organisms;Bacteria;PVC group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila;Akkermansia muciniphila ATCC BAA-835
...
real 0m13.151s
user 0m23.251s
sys 0m0.948s
NCBI esearch with db=taxonomy https://www.ncbi.nlm.nih.gov/books/NBK25499/
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=taxonomy&term=%22Homo+sapiens%22
https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20060628/esearch.dtd">
<eSearchResult>
<Count>1</Count>
<RetMax>1</RetMax>
<RetStart>0</RetStart>
<IdList>
<Id>9606</Id>
</IdList>
<TranslationSet/>
<TranslationStack>
<TermSet>
<Term>"Homo sapiens"[All Names]</Term>
<Field>All Names</Field>
<Count>1</Count>
<Explode>N</Explode>
</TermSet>
<OP>GROUP</OP>
</TranslationStack>
<QueryTranslation>"Homo sapiens"[All Names]</QueryTranslation>
</eSearchResult>
$ echo -e "Homo+Sapiens\nMus+Musculus\nRattus+Norvegicus" | while read L ; do echo -n "$L " && curl -s "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=taxonomy&term=%22${L}%22" | xmllint --xpath '/eSearchResult/IdList/Id[1]/text()' - ; echo; done
Homo+Sapiens 9606
Mus+Musculus 10090
Rattus+Norvegicus 10116
or you can just download the raw data: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz and use linux join
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
obligatory R code: