Hi all,
I am running Kraken2 on some water eDNA samples. I am getting unusually high classification - only around 35% of my reads are classed as 'unclassified'. I know of other people who run similar analysis on water samples and their classification success is something like 0.9% of reads being classified.
Is there any reason why this might be?? Have I somehow set a low stringency for Kraken2 classification, or am I reading the output wrong? I am looking at the k2report file produced and going off that.
Two things to check would be the database against you run your classification, and the
--confidence
value (which I think is 0 by default).Thanks for your reply - do you mean checking the quality of the database sequences? I built it from only RefSeq sequences of a specific list of organisms, I don't see how this could have caused an issue :(
Sorry I wasn't clear. Since you were comparing your results to others, I suggested that you make sure your DB is comparable to the ones used in other studies. But probably the more important point is to check the confidence value. If I am not mistaken, by default even a single k-mer match is enough to classify a query sequence. So, as suggested by @Istvan , you should probably take a more careful look at the output and determine which taxa your sequences are classified to and with what confidence score.