Question about Kraken2 core_nt vs nt database results.
1
0
Entering edit mode
10 weeks ago
Mark ▴ 30

I'm trying to run Kraken2 on a sequencing reads from a lot of novel, eukaryotic samples. I'm looking to choose an appropriate database for this purpose and the most up-to-date on the Kraken2 databases website (https://benlangmead.github.io/aws-indexes/k2) is core_nt. However upon researching, it seems that the new NCBI core_nt database does not include eukaryotic chromosome assemblies (unless I am mistaken). This may be fine for things like BLAST where typically the annotated genomic regions are of interest, but for a program like Kraken2 that performs a genome-wide k-mer analysis, I'm struggling to understand how the new default database is suitable for this program. Isn't it missing a lot of important information and doesn't this reduce the accuracy of the taxonomic predictions?

database kraken2 kmer ncbi • 319 views
ADD COMMENT
0
Entering edit mode
10 weeks ago
GenoMax 150k

Data size for nt has become so large that many users don't have the necessary hardware available (storage/memory) to effectively use that database. Even kraken2 authors may be struggling to make these indexes in a timely manner. If you have the necessary resources available you could build your own nt kraken2 database. Program authors are making the pre-made indexes available as a courtesy for users. An end user needs to decide suitability depending on their application.

You may actually want to get RefSeq genomes, Representative Assemblies (https://www.ncbi.nlm.nih.gov/datasets/genome/ ) and create kraken2 databases, if you want to maximize genomic content.

ADD COMMENT

Login before adding your answer.

Traffic: 1609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6