I'm trying to run Kraken2 on a sequencing reads from a lot of novel, eukaryotic samples. I'm looking to choose an appropriate database for this purpose and the most up-to-date on the Kraken2 databases website (https://benlangmead.github.io/aws-indexes/k2) is core_nt
. However upon researching, it seems that the new NCBI core_nt
database does not include eukaryotic chromosome assemblies (unless I am mistaken). This may be fine for things like BLAST where typically the annotated genomic regions are of interest, but for a program like Kraken2 that performs a genome-wide k-mer analysis, I'm struggling to understand how the new default database is suitable for this program. Isn't it missing a lot of important information and doesn't this reduce the accuracy of the taxonomic predictions?