Question

High classification percentage on Kraken2?

0

Entering edit mode

20 months ago

Natalia • 0

Hi all,

I am running Kraken2 on some water eDNA samples. I am getting unusually high classification - only around 35% of my reads are classed as 'unclassified'. I know of other people who run similar analysis on water samples and their classification success is something like 0.9% of reads being classified.

Is there any reason why this might be?? Have I somehow set a low stringency for Kraken2 classification, or am I reading the output wrong? I am looking at the k2report file produced and going off that.

kraken2 metagenomics • 1.8k views

ADD COMMENT • link updated 20 months ago by liorglic ★ 1.5k • written 20 months ago by Natalia • 0

0

Entering edit mode

Two things to check would be the database against you run your classification, and the --confidence value (which I think is 0 by default).

ADD REPLY • link 20 months ago by liorglic ★ 1.5k

0

Entering edit mode

Thanks for your reply - do you mean checking the quality of the database sequences? I built it from only RefSeq sequences of a specific list of organisms, I don't see how this could have caused an issue :(

ADD REPLY • link 20 months ago by Natalia • 0

0

Entering edit mode

Sorry I wasn't clear. Since you were comparing your results to others, I suggested that you make sure your DB is comparable to the ones used in other studies. But probably the more important point is to check the confidence value. If I am not mistaken, by default even a single k-mer match is enough to classify a query sequence. So, as suggested by @Istvan , you should probably take a more careful look at the output and determine which taxa your sequences are classified to and with what confidence score.

ADD REPLY • link 20 months ago by liorglic ★ 1.5k

score 2 · Answer 1 · 2023-08-24

2

Entering edit mode

20 months ago

Istvan Albert 102k

Kraken 2 can generate information where, for every sequence, it lists what taxonomical hits it has seen and how the lowest common ancestor was determined (LCA)

https://github.com/DerrickWood/kraken2/wiki/Manual#confidence-scoring

that can give you a good sense of how accurate the classifications are

ADD COMMENT • link 20 months ago by Istvan Albert 102k