Adding new taxa to a Kraken2 db
1
2
Entering edit mode
3.0 years ago
valentinavan ▴ 50

Hi, can someone please check if these following steps are correct?

I am trying to add to my plants kraken2 db ("plant_original") few taxa genomes that I have downloaded from the NCBI website (alnus_glutinosa_GCA_003254965.1.fna, carpinus_fangiana_GCA_006937295.1.fna etc..).

for file in *.fna   
    do
        kraken2-build --add-to-library $file --db PATH/kraken/plant_original
    done    

Masking low-complexity regions of new file... done. Added "alnus_glutinosa_GCA_003254965.1.fna" to library (PATH/kraken/plant_original)

Masking low-complexity regions of new file... done.

Added "carpinus_fangiana_GCA_006937295.1.fna" to library (PATH/kraken/plant_original)

Masking low-complexity regions of new file... done.

kraken2-build --build --db ~/kraken/plant_original

Creating sequence ID to taxonomy ID map (step 1)...

Sequence ID to taxonomy ID map already present, skipping map creation.

Estimating required capacity (step 2)...

Estimated hash table requirement: 73390180936 bytes Capacity estimation complete. [1h53m11.000s]

Building database files (step 3)... Hash table already present, skipping database file build. Database construction complete. [Total: 1h53m11.000s]

Then, I run kraken with one of my sample against this new updated plants db (I did not change the name of the db, is still called plant_original):

kraken2 --db PATH/kraken/plant_original --threads 8 --confidence 0.1 --report PATH/SB0new_report.txt PATH/SB0.fastq.gz --report-zero-counts --output PATH/SB0new_taxa.txt

However, the new kraken2 report is exactly the same as the old one and it did not find any hit to these new added taxa.

Previously, I did a blastn alignment of these sample’s reads with these exact same taxa (I created a small db with alnus_glutinosa_GCA_003254965.1.fna, carpinus_fangiana_GCA_006937295.1.fna etc..) and blastn could find some hits. So I was expecting that Kraken2 would have found these matches too.

Are these steps wrong or am I making some conceptual mistake?

I appreciate your help.

Thanks

plants database blast kraken2 • 3.3k views
ADD COMMENT
0
Entering edit mode

I think I'm running into similar problem. Have you found solution yet? I'm thinking I have to delete all the .k2d files and remake everything with new database (which will take a long time I think).

ADD REPLY
0
Entering edit mode
3.0 years ago
Mark ★ 1.6k

You can not add genomes to an existing, precomputed kraken db. You have to download the plant genomes, add the new genomes then recreate the db.

ADD COMMENT
0
Entering edit mode

Hi Mark,

thanks for replying. Is it not what I have done? I have downloaded the genomes and added them to the db. Can you please be more specific so that I can understand. Thanks

ADD REPLY
0
Entering edit mode

Edit: Mark is right. You added new files to an existing library. Do you have the fasta files of the initial plant database?

ADD REPLY
0
Entering edit mode

Hi,

Thanks for your help. I made custom database (from scratches), I tested Kraken2 with a sample and I got matches, however, when I created the db it also produced an unmapped.txt file with a long list of accession numbers. I am not sure how to treat this...

Please see this other post: Custom Kraken2 db

ADD REPLY

Login before adding your answer.

Traffic: 1979 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6