I want to use Kraken2 to annotate my unmapped reads but I need to use the nt database from NCBI. The nt database comes in so many parts and I've only seen one command on kraken2 which is:
kraken2-build --standard -db $DBname
Where I would replace $DBname with the full ftp path to the databases. But the nt databases in the ftp server is split into so many parts. So do I have to run this command for like 40+ instances of the nt database? Is there a way for kraken2 to access the nt database all at once and stitch it together? I am having no luck figuring this part out. Thank you for your help.
try
ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/
I don't think you will find for download a kraken2 nt database. You may try to find one at Zenodo, there are lots of custom databases deposited there.
The available official databases are found at Kraken2 Index zone, and some useful (but ageing) databases can be found at Loman Lab Mock Community Experiments Databases.
You can use the GTDB based indices, it's cleaner than nt: https://github.com/hcdenbakker/GTDB_Kraken
OP:
I would replace $DBname with the full ftp path to the databases
Kraken2-build automatically builds index and $DBname is a name which you would use for the analysis, downstream (copy/pasted from the manual: Replace "$DBNAME" above with your preferred database name/location). An example would be
kraken-build --standard --db standard_kraken_index_folder
. But you can also customize (library) what you can download for indexing. Kraken build allows partial indexing as well. Refer herehttps://github.com/DerrickWood/kraken2/blob/master/scripts/kraken2-build
andhttp://manpages.ubuntu.com/manpages/eoan/en/man1/kraken2-build.1.html
As other biostars say that building index requires substantial computational resources, you can download pre-built indices.
I am still lost as to how to direct my kraken2 commands to the nt database. I get that $DBname now is just a name I choose to call my database, but how am I telling kraken2 to build me my database using nt as the source?
Should I download the nt database somewhere in my folder and refer to it? Then it takes me back to my initial issue where the nt database is split into too many parts.