kraken2 bacteria database 250GB+
0
1
Entering edit mode
12 months ago
10mz1 ▴ 10

Hi all,

I've been trying to get Kraken2 installed locally and the main database is clearly far too large. I got 16S greengenes working fine and so now I'd like to try a larger database, the entire bacterial database. I don't understand why it is so huge on my local PC. For example the kraken 2 manual states that the entire standard database (which consists of the bacterial one and others if I am not mistaken) should take around 100GB. I've tried installing and building only the bacterial database locally and it is currently taking 254GB and failed to install as the disc filled up completely. What gives?

metagenomics 16s kraken2 kraken • 2.0k views
ADD COMMENT
1
Entering edit mode

From where and using what criteria did you get the bacterial genomes/sequences? Standard pre-built kraken database seems to restrict itself to RefSeq and is only 70GB.

ADD REPLY
0
Entering edit mode

followed the command here: https://github.com/DerrickWood/kraken2/wiki/Manual#custom-databases

kraken2-build --download-library bacteria --db $DBNAME

ADD REPLY
0
Entering edit mode

Guess bacterial content at NCBI could have grown significantly since when the manual was written.

ADD REPLY
0
Entering edit mode

It is a good idea to work with as small database as possible to get the job done. Still, 6-10 TB hard drives are available for $150-300. Given the affordability, I think these days no research should suffer because of disk space.

ADD REPLY
1
Entering edit mode

If 250GB of data filled up OP's local disk then it is difficult to imagine that there is enough RAM available to go with the disk.

ADD REPLY
0
Entering edit mode

"It is a good idea to work with as small database as possible to get the job done". What would be your argumentation for this? One could argue that a larger database performs better, as e.g. shown by Pochon et al. 2023 - figure 8: https://link.springer.com/content/pdf/10.1186/s13059-023-03083-9.pdf

ADD REPLY

Login before adding your answer.

Traffic: 2860 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6