Is there an API/script/package to find the scientific name for a given common name or the variation of the scientific name at hand?
3
2
Entering edit mode
2.6 years ago
Rijan ▴ 30

I have a list of some 200 plant common names that I need to find the respective scientific names for. I could just google every one of them but if possible I want to use a software package that can automate the task (for the sake of reproducibility and my own sanity).

I thought of writing a python script that queries google (for spelling correction and context dependent search) and then grab the supplied Wikipedia URL and then try to find the scientific name with 'beautifulsoup' but turns out that slaps you with captcha pretty fast.

Is there a package/tool/method that comes closest to being an industry standard?

API Scientific nomenclature • 1.3k views
ADD COMMENT
3
Entering edit mode
2.6 years ago
GenoMax 147k

Using EntrezDirect:

$ esearch -db taxonomy -query "thale cress" | esummary | xtract -pattern DocumentSummary -element CommonName,ScientificName
thale cress     Arabidopsis thaliana
$ esearch -db taxonomy -query "zebra fish" | esummary | xtract -pattern DocumentSummary -element CommonName,ScientificName
zebra fish      Girella zebra
zebrafish       Danio rerio
ADD COMMENT
1
Entering edit mode
2.6 years ago

I have never used it, but I know that there is the Taxize package for this task in R and I suppose that there should be similar packages for Python, too. If you need or want to write something yourself, I would at least start from a more structured source dataset such as Wikispecies, which features vernacular names in many languages.

ADD COMMENT
1
Entering edit mode
2.6 years ago
Michael 55k

The NCBI taxonomy and its related software package can identify the species from the common name as well, however, there is no state of the art for common names IMO. They could be non-unique, misspelled, redundant and what not. Anyway, the answers to this question will do a good job in many cases: From scientific name to taxonomy information entrez

For example, my script using the BioPerl Taxonomy modules can often resolve the names, but Taxon kit may work too.

Whatever you do, you should manually curate your results. Anyways, BioPerl's taxonomy implementation together with NCBI taxonomy is sort of the "professional standard" you are looking for.

example:

% ./getLCA.pl thale_cress
nodesfile or namesfile not found, using entrez online data

######################### thale_cress ##############################
ID: 3702 Sci.name: Arabidopsis thaliana
 [thale cress Arabis thaliana mouse-ear cress thale-cress authorityArabidopsis thaliana (L.) Heynh., 1842 
  authorityArabis 
  thaliana L., 1753 misspellingArabidopsis thaliana (thale cress) misspellingArabidopsis_thaliana misspellingArbisopsis 
  thaliana misspellingthale kress ]
 Phylum: Streptophyta
 cellular organisms:Eukaryota:Viridiplantae:Streptophyta:Streptophytina:Embryophyta:Tracheophyta:Euphyllophyta:
 Spermatophyta:Magnoliopsida:Mesangiospermae:eudicotyledons:Gunneridae:Pentapetalae:rosids:malvids:
 Brassicales:Brassicaceae:Camelineae:Arabidopsis 
ADD COMMENT

Login before adding your answer.

Traffic: 1915 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6