Fetching NCBI ID for a list of species?
2
1
Entering edit mode
7.5 years ago
a.rex ▴ 350

I have a long list of species names.

I was wondering whether a tool was available online, which could fetch the NCBI IDs?

gene phylogeny • 3.4k views
ADD COMMENT
0
Entering edit mode

obligatory R code:

library("myTAI")
data=c("Homo sapiens", "Mus musculus")
sapply(data, function(x) taxonomy(x, db = "ncbi", output   = "taxid"))
ADD REPLY
2
Entering edit mode
7.5 years ago

try taxonkit, providing single binary files for Windows/Linux. It's really fast.

$ cat names.txt 
Homo sapiens
Akkermansia muciniphila ATCC BAA-835
Akkermansia muciniphila
Mouse Intracisternal A-particle
Wei Shen
uncultured murine large bowel bacterium BAC 54B
Croceibacter phage P2559Y

$ time taxonkit name2taxid names.txt
[INFO] parsing names file: /home/shenwei/.taxonkit/names.dmp
[INFO] 1587755 names parsed
Homo sapiens    9606
Akkermansia muciniphila ATCC BAA-835    349741
Akkermansia muciniphila 239935
Mouse Intracisternal A-particle 11932
Wei Shen
uncultured murine large bowel bacterium BAC 54B 314101
Croceibacter phage P2559Y       1327037

real    0m7.217s
user    0m10.682s
sys     0m0.471s

$ time taxonkit name2taxid names.txt | taxonkit lineage -i 2
Homo sapiens    9606    cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;Homo sapiens
Akkermansia muciniphila ATCC BAA-835    349741  cellular organisms;Bacteria;PVC group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila;Akkermansia muciniphila ATCC BAA-835
...

real    0m13.151s
user    0m23.251s
sys     0m0.948s
ADD COMMENT
1
Entering edit mode

This is a very powerful and useful tool. Thank you

ADD REPLY
3
Entering edit mode
7.5 years ago

NCBI esearch with db=taxonomy https://www.ncbi.nlm.nih.gov/books/NBK25499/

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=taxonomy&term=%22Homo+sapiens%22


https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20060628/esearch.dtd">
<eSearchResult>
  <Count>1</Count>
  <RetMax>1</RetMax>
  <RetStart>0</RetStart>
  <IdList>
    <Id>9606</Id>
  </IdList>
  <TranslationSet/>
  <TranslationStack>
    <TermSet>
      <Term>"Homo sapiens"[All Names]</Term>
      <Field>All Names</Field>
      <Count>1</Count>
      <Explode>N</Explode>
    </TermSet>
    <OP>GROUP</OP>
  </TranslationStack>
  <QueryTranslation>"Homo sapiens"[All Names]</QueryTranslation>
</eSearchResult>
ADD COMMENT
0
Entering edit mode

Thanks Pierre. However I have a list of 3000+ species, doing each one individually would be exhaustive. Is there a way of inputting the entire list in to the db?

ADD REPLY
1
Entering edit mode
$ echo -e "Homo+Sapiens\nMus+Musculus\nRattus+Norvegicus" | while read L ; do echo -n "$L " && curl -s "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=taxonomy&term=%22${L}%22" | xmllint --xpath '/eSearchResult/IdList/Id[1]/text()' - ; echo; done
Homo+Sapiens 9606
Mus+Musculus 10090
Rattus+Norvegicus 10116
ADD REPLY
1
Entering edit mode

or you can just download the raw data: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz and use linux join

ADD REPLY

Login before adding your answer.

Traffic: 1988 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6