Hi Biostars community,
I want to obtain taxonomy information (taxon id) of the NCBI non-redundannt library by protein accession number and am currently doing that with files from https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/. As a second step I then want to obtain lineage information through the taxids that I had just related to my hits.
I am trying to implement it this way rather then using NCBIs eutils (namely epost and esummary because using those I have some problems with accession numbers not starting like "WP_").
So my current question is how can I obtain lineage information for those accession numbers that I have related to taxonomy information? Is there an already existent implementation of taxid2lineage that I can use?
If though you have some suggestions on how to do the whole procedure differently instead or know of an alternative way, I'm looking forward and am greatful for your suggestions and help! Thank you!
Not exactly sure what you mean by lineages but it sounds like you already know about this: A: Convert list of Accession Numbers to Full Taxonomy
As for
WP*
accessions those are special. They potentially refer to multiple entries across organisms and are going to be difficult to manage.Thanks for your reply!
I have now been using fullnamelineage.dmp from new_taxdump.tar.gz from ftp://ftp.ncbi.nih.gov/pub/taxonomy/new_taxdump/. Doing a search for the taxids and thereby the related linages line by line.
Prior to that I have been doing the same to extract the taxids using the accession numbers. Looking line by line through files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/.
Using the pdb.accession2taxid file from ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/ I was able to manage to work with acc. no. starting like 'WP_' without problems. It is important though to download the pdb.accession2taxid.gz file as well when working with 'WP_' (FYI - or whom it might interes in the future). ;)
By 'lineage' I was refering to evolutionary lineage; As a temporal series of populations, organisms, cells or genes connected through theier ancestors and descendants; Subsets of the evolutionary tree of life/ Subsets of a phylogenetic tree.
Again, thank you for your help!