Hello, I have a fasta file contains thousands of records like this:
X17276.1 Giant Panda satellite 1 DNA GATCCTCCCCAGGCCCCTACACCCAATGTGGAACCGGGGTCCCGAATGAAAATGCTGCTGTTCCCTGGAGGTGTTTTCCT GGACGCTCTGCTTTGTTACCAATGAGAAGGGCGCTGAATCCTCGAAAATCCTGACCCTTTTAATTCATGCTCCCTTACTC ACGAGAGATGATGATCGTTGATATTTCCCTGGACTGTGTGGGGTCTCAGAGACCACTATGGGGCACTCTCGTCAGGCTTC CGCGACCACGTTCCCTCATGTTTCCCTATTAACGAAGGGTGATGATAGTGCTAAGACGGTCCCTGTACGGTGTTGTTTCT GACAGACGTGTTTTGGGCCTTTTCGTTCCATTGCCGCCAGCAGTTTTGACAGGATTTCCCCAGGGAGCAAACTTTTCGAT GGAAACGGGTTTTGGCCGAATTGTCTTTCTCAGTGCTGTGTTCGTCGTGTTTCACTCACGGTACCAAAACACCTTGATTA TTGTTCCACCCTCCATAAGGCCGTCGTGACTTCAAGGGCTTTCCCCTCAAACTTTGTTTCTTGGTTCTACGGGCTG
now want to add the taxID based on the GI for each line, which should be the best way?
Thank you!!
Thank you Brian!! I figured it out with the JGI.
Hello Brian,
I am in a similar position to the original poster, having a fasta file containing hundreds of thousands of sequences for many species which currently don't have taxids. Here is my current format:
My understanding is that BBMap will allow me to add the taxid using this "renamebytaxa.sh" tool, which is exactly what I would like to do. I've downloaded BBMap and ran the fetchtaxonomy shell script, but I can't seem to find a "renamebytaxa" file. I've found "rename.sh", but I can't see a simple way to use this to the same effect.
I also tried following the instructions at: http://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/taxonomy-guide/ but I keep getting an error for every shell script I try to run (taxtree,sh, for example) saying "could not find or load main class". Do you have any advice for this issue? Any help is massively appreciated.
Oh, sorry... not sure why I called it "renamebytaxa.sh"; it's actually "gi2taxid.sh". And the instructions on the website are a little outdated; the latest version of documentation is always distributed with BBMap (in this case/docs/guides/TaxonomyGuide.txt, which still could use a little updating).
In general, for NCBI-style names, the process is like this:
...where /path/to/taxonomy/ is the location where you ran fetchtaxonomy.sh. Even though it's called "gi2taxid.sh", it works with both gi numbers or accessions, as long as the headers are in the standard NCBI format. It alternatively works with the Silva header format if you add the "silva" flag.
The before/after results look like this:
...where 9646 is the NCBI tax ID.
Hello Brian,
Many thanks for your kind advice. I am, however, having issues using the shell scripts in the package. Whilst "fetchtaxonomy" seems to run at first, it meets an error when the other shell scripts become involved:
Errors occur for each of the shell scripts individually, too:
I'm assuming this is an issue with how I have things set up? Any advice would be massively appreciated!
Many thanks,
Jordan