Please is is there any way to extract all the eukaryotic reads from the non-redundant NCBI database ? What would be the exact command line that would help me to do that ? or where can I find the non-redundant Eukaryotic database ?
Thanks in advance for your support
DD
This tutorial recursively gets all taxID's which belong to a higher level taxID. These would be needed to get all sequences under the top level taxID (e.g. eukaryota).
Thanks for sharing the tutorial , I am following the discussion and read the tutorial to see if could filter the the nr based on taxonomy
DD
I am following to this tutorial to filter the non-redundant database but I get stock in step 3. You suggest to NOT directly download nr.gz from ncbi ftp, in which the FASTA headers are not well formatted. However the link you provide to download the pre-formatted nr database is not working.
Please where can get the nr database that is well formatted to wok with this tutorial ?
Thanks in advance
ftp://ftp.ncbi.nlm.nih.gov/blast/db
I am getting a problem with taxonkit tutorial. I get stock in the step 3 (option 1). After retrieving FASTA sequences from pre-formated blastdb, the perl one-liner that used to unfold records having mulitple accessions output an empty nr.$id.fa.gz.
Please any suggestions ?
so, is it 6656 or 6665?
Sorry, it was a problem in editing the message. the id is 6656
What's your taxid ??? 6656 is just the example.
I need to retrieve get a new database for Chlorophyta with taxid 3041. the perl one-liner script outputs an empty file for watever taxid I use . I could not even reproduce the tutorial. It could be trivial but I don't know why I don't get it.
thank you for your assistance
DD
please paste results of
I'll check it on monday.
Thanks they are copied here :
Note that I could not reproduce the tutorial, the option 2 of the step three works but , but could not pass the stp4 (make blasted). You did not provide how to deal with this error in the case a user would se the the option 2 of the step3. I try try to remove duplicate reads, and suspect this is not the good option.
Thanks again for your assistance
DD
Well, I tried 3041, and every step worked fine.
I also add a faster way to get nr.$taxid.fa.gz from nr.fa.gz, in step 3 option 1.
I can send nr.3041.fa.gz (70M) to you via email, if you give my address or write email to me.
Thanks a lot! Please I would be happy if you can send the nr.$id.fa forme to me at derilus.dieunel@upr.edu or dieunelderilus@gmail.com.
But should keep testing to see where I am wrong, because this is a very interesting tools.