Hi guys,
does anyone know how I get TaxID mapping file
for NR or Uniprot database?
Background:
I use Diamond
for my de novo transcriptome annotation. My next goal is to use hits tsv file in blobtools
for contamination detection. To do that I need my query transcript IDs with the corresponding subject TaxID in hits.tsv file. Diamond doesn't give that information but I can use blobtools taxify
option to match corresponding TaxidIDs to my subject hits. I read blobtools documentation and to do that I need TaxID mapping file
for the database that I used for annotation and that file consists of information such as.
I am not sure how to get that file so please help. :)
nodesDB file should have been installed if you had used "Install" script for
blobtools
according to : https://blobtools.readme.io/docs/taxonomy-databaseYou can find the NCBI taxonomy database files here: https://ftp.ncbi.nih.gov/pub/taxonomy/ Take a look at https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt for the contents.
thank you, I'll look at these files/documents. .
if I understood correctly, I might need fle
prot.accession2taxid.gz
file? According to the documentation in column 2 is Accession.version and in column 3 is TaxID. I should download that file from NCBI, unpack it and than do:Does that make sense?
Did anyone try this?
I also saw this post about getting taxonomy info in Diamond output. Still, it seems it has to be incorporated in
makedb
step + I might be getting more than 1 taxid hit according to Diamond documentation which I am not sure might work with blobtools.