Python assistance -ete3
Entering edit mode
2.7 years ago
Gino • 0

Hey guys, I'm new to python and general bioinformatics.

I'm currently working on a project that requires I translate information from two excel files (Each with column for species/ common name) into a taxonomy ID. Since the orignal species/common names are not always accurate, I found a function online that would find the best correct species name. There is also a function that will translate the species name to taxonomy ID. Both functions are found under ETE3

I don't know what values/variables would go in the functions(at the end of the list) to get a result.

My current code in python(Visual Studio Code) after activating anaconda is

import pandas as pd
import numpy as np
import ete3
pip install ncbi-taxonomist

Which gives Note: you may need to restart the kernel to use updated packages.

from ete3 import NCBITaxa
ncbi = NCBITaxa()

def get_fuzzy_name_translation(self, name, sim=0.9):
    Given an inexact species name, returns the best match in the NCBI database of taxa names.
    :argument 0.9 sim: Min word similarity to report a match (from 0 to 1).
    :return: taxid, species-name-match, match-score

    import sqlite3.dbapi2 as dbapi2
    _db = dbapi2.connect(self.dbfile)
    module_path = os.path.split(os.path.realpath(__file__))[0]
    _db.execute("select load_extension('%s')" % os.path.join(module_path,

    print("Trying fuzzy search for %s" % name)
    maxdiffs = math.ceil(len(name) * (1-sim))
    cmd = 'SELECT taxid, spname, LEVENSHTEIN(spname, "%s") AS sim  FROM species WHERE sim<=%s ORDER BY sim LIMIT 1;' % (name, maxdiffs)
    taxid, spname, score = None, None, len(name)
    result = _db.execute(cmd)
        taxid, spname, score = result.fetchone()
    except TypeError:
        cmd = 'SELECT taxid, spname, LEVENSHTEIN(spname, "%s") AS sim  FROM synonym WHERE sim<=%s ORDER BY sim LIMIT 1;' % (name, maxdiffs)
        result = _db.execute(cmd)
            taxid, spname, score = result.fetchone()
            taxid = int(taxid)
        taxid = int(taxid)

    norm_score = 1 - (float(score)/len(name))
    if taxid:
        print("FOUND!    %s taxid:%s score:%s (%s)" %(spname, taxid, score, norm_score))

    return taxid, spname, norm_score


def get_name_translator(self, names):
    Given a list of taxid scientific names, returns a dictionary translating them into their corresponding taxids.
    Exact name match is required for translation.

    name2id = {}
    #name2realname = {}
    name2origname = {}
    for n in names:
        name2origname[n.lower()] = n

    names = set(name2origname.keys())

    query = ','.join(['"%s"' %n for n in six.iterkeys(name2origname)])
    cmd = 'select spname, taxid from species where spname IN (%s)' %query
    result = self.db.execute('select spname, taxid from species where spname IN (%s)' %query)
    for sp, taxid in result.fetchall():
        oname = name2origname[sp.lower()]
        name2id.setdefault(oname, []).append(taxid)
        #name2realname[oname] = sp
    missing =  names - set([n.lower() for n in name2id.keys()])
    if missing:
        query = ','.join(['"%s"' %n for n in missing])
        result = self.db.execute('select spname, taxid from synonym where spname IN (%s)' %query)
        for sp, taxid in result.fetchall():
            oname = name2origname[sp.lower()]
            name2id.setdefault(oname, []).append(taxid)
            #name2realname[oname] = sp
    return name2id

>> All of these codes run fine, my problem is figuring out how to get results(valid values/variables for ?'s) from a non-accurate species name into an accurate species name using:

    from ete3 import NCBITaxa
    ncbi= NCBITaxa
    fuzzy_name = ncbi.get_fuzzy_name_translation(?,?,?)
    print (dog?,0.9?)

Also how to get taxonomy IDs using 

    from ete3 import NCBITaxa
    ncbi= NCBITaxa
    taxid_name = ncbi.get_name_translator(?)
    print (?)

I ran


and got

Help on function get_fuzzy_name_translation in module __main__: 

get_fuzzy_name_translation(self, name, sim=0.9)
Given an inexact species name, returns the best match in the NCBI database of taxa names.
 :argument 0.9 sim: Min word similarity to report a match (from 0 to 1). 
:return: taxid, species-name-match, match-score

Help on function get_name_translator in module __main__:
get_name_translator(self, names)
 Given a list of taxid scientific names, returns a dictionary translating them into their corresponding taxids.
 Exact name match is required for translation.

I apologize for the long post and bad formatting of codes, I tried my best to give information as clear as possible.

Any pointers would be great! I'm working on it everyday to try and figure it out.

etetoolkit • 892 views

Login before adding your answer.

Traffic: 2354 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6