Hello,
For experts in metagenomics, I am using Kraken2 to do the taxonomic assignments.
When testing it on simulated datasets or actual sequenced raw data, I see a lot of species and genera identified but with a very small number of sequences (under 0.05% in terms of percentages). I know that the majority of them are false positives but I don't know how to proceed to identify those false positives.
Can I just filter by percentages or should I filter by the number of assigned reads? Or are there other criteria to filter?
If there are other criteria and it's more complicated than just filtering according to some numbers, can you recommend tools that can help me to identify and delete those false positives?
Thanks in advance
Have a good day!