How to get Taxonomy and Taxon ID for an Accession number using R/Python?
2
1
Entering edit mode
4.9 years ago
Frieda ▴ 60

Hello,

I have a database which contains Taxonomy and Taxon ID for each Accession Number. The database already has millions of accession numbers and their corresponding taxonomy and taxon ID. But the database is not complete yet. This database is being used to assign taxonomy to blast results. I am planning to complete the database by collecting the accession numbers which are not found in my local database in each blast search.

Does anyone have any suggestions for me how I can retrieve taxonomy and taxon ID of some accession numbers from NCBI using R or python?

Thanks

R PYTHON accession numer taxonomy • 4.9k views
ADD COMMENT
1
Entering edit mode

And another related previous post.

ADD REPLY
2
Entering edit mode
4.9 years ago

For R, check the taxonomizr package.

ADD COMMENT
0
Entering edit mode

I already know about this package, but I do not want to download a complete database. My intention is to complete my existing database. Thanks anyways.

ADD REPLY
1
Entering edit mode
4.9 years ago
Chirag Parsania ★ 2.0k

I have been writing an R package to deal with taxonomy mapping for blast outcome. The package is in very initial stage. However, there are functions which can solve your problem. You can install package using devtools::install_github("cparsania/phyloR").

See the example below

    library(phyloR)

    x <- c("XP_022900619.1", "XP_022900618.1", "XP_018333511.1", "XP_018573075.1")
    ## get ncbi taxon id for accession 
    genbank2uid_tbl(x = x)
    #> No ENTREZ API key provided
    #>  Get one via taxize::use_entrez()
    #> See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
    #> ✔ done.  Time taken -2.81795477867126
    #> # A tibble: 4 x 8
    #>   x      taxid  class match multiple_matches pattern_match uri      name   
    #>   <chr>  <chr>  <chr> <chr> <lgl>            <lgl>         <chr>    <chr>  
    #> 1 XP_02… 166361 uid   found FALSE            FALSE         https:/… trypsi…
    #> 2 XP_02… 166361 uid   found FALSE            FALSE         https:/… trypsi…
    #> 3 XP_01… 224129 uid   found FALSE            FALSE         https:/… trypsi…
    #> 4 XP_01… 217634 uid   found FALSE            FALSE         https:/… trypsi…

    ## Get phylogenetic rank for a given ncbi taxonomy id.

    phyloR::get_taxon_rank(gid_to_taxid$taxid , rank = "kingdom") 
   #> ● Starting rank search...
   #> ✓ done.  Time taken -0.353924989700317
   #> # A tibble: 3 x 4
   #>   query_taxon kingdom kingdom_id rank   
   #>   <chr>       <chr>   <chr>      <chr>  
   #> 1 166361      Metazoa 33208      kingdom
   #> 2 224129      Metazoa 33208      kingdom
   #> 3 217634      Metazoa 33208      kingdom

You can define taxonomic level in rank argument to get specific level taxonomy for a given taxon id

Created on 2020-02-10 by the reprex package (v0.3.0)

ADD COMMENT

Login before adding your answer.

Traffic: 2627 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6