Question

Fetching NCBI ID for a list of species?

1

Entering edit mode

7.5 years ago

a.rex ▴ 350

I have a long list of species names.

I was wondering whether a tool was available online, which could fetch the NCBI IDs?

gene phylogeny • 3.4k views

ADD COMMENT • link updated 7.5 years ago by shenwei356 8.7k • written 7.5 years ago by a.rex ▴ 350

0

Entering edit mode

obligatory R code:

library("myTAI")
data=c("Homo sapiens", "Mus musculus")
sapply(data, function(x) taxonomy(x, db = "ncbi", output   = "taxid"))

ADD REPLY • link 7.5 years ago by cpad0112 21k

3

Entering edit mode

7.5 years ago

Pierre Lindenbaum 164k

NCBI esearch with db=taxonomy https://www.ncbi.nlm.nih.gov/books/NBK25499/

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=taxonomy&term=%22Homo+sapiens%22


https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20060628/esearch.dtd">
<eSearchResult>
  <Count>1</Count>
  <RetMax>1</RetMax>
  <RetStart>0</RetStart>
  <IdList>
    <Id>9606</Id>
  </IdList>
  <TranslationSet/>
  <TranslationStack>
    <TermSet>
      <Term>"Homo sapiens"[All Names]</Term>
      <Field>All Names</Field>
      <Count>1</Count>
      <Explode>N</Explode>
    </TermSet>
    <OP>GROUP</OP>
  </TranslationStack>
  <QueryTranslation>"Homo sapiens"[All Names]</QueryTranslation>
</eSearchResult>

ADD COMMENT • link 7.5 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thanks Pierre. However I have a list of 3000+ species, doing each one individually would be exhaustive. Is there a way of inputting the entire list in to the db?

ADD REPLY • link 7.5 years ago by a.rex ▴ 350

1

Entering edit mode

$ echo -e "Homo+Sapiens\nMus+Musculus\nRattus+Norvegicus" | while read L ; do echo -n "$L " && curl -s "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=taxonomy&term=%22${L}%22" | xmllint --xpath '/eSearchResult/IdList/Id[1]/text()' - ; echo; done
Homo+Sapiens 9606
Mus+Musculus 10090
Rattus+Norvegicus 10116

ADD REPLY • link 7.5 years ago by Pierre Lindenbaum 164k

1

Entering edit mode

or you can just download the raw data: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz and use linux join

ADD REPLY • link 7.5 years ago by Pierre Lindenbaum 164k

score 2 · Accepted Answer · 2017-05-29

try taxonkit, providing single binary files for Windows/Linux. It's really fast.

$ cat names.txt 
Homo sapiens
Akkermansia muciniphila ATCC BAA-835
Akkermansia muciniphila
Mouse Intracisternal A-particle
Wei Shen
uncultured murine large bowel bacterium BAC 54B
Croceibacter phage P2559Y

$ time taxonkit name2taxid names.txt
[INFO] parsing names file: /home/shenwei/.taxonkit/names.dmp
[INFO] 1587755 names parsed
Homo sapiens    9606
Akkermansia muciniphila ATCC BAA-835    349741
Akkermansia muciniphila 239935
Mouse Intracisternal A-particle 11932
Wei Shen
uncultured murine large bowel bacterium BAC 54B 314101
Croceibacter phage P2559Y       1327037

real    0m7.217s
user    0m10.682s
sys     0m0.471s

$ time taxonkit name2taxid names.txt | taxonkit lineage -i 2
Homo sapiens    9606    cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;Homo sapiens
Akkermansia muciniphila ATCC BAA-835    349741  cellular organisms;Bacteria;PVC group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila;Akkermansia muciniphila ATCC BAA-835
...

real    0m13.151s
user    0m23.251s
sys     0m0.948s