I am trying to generate a summary of parasites in human metagenomic samples.
I have been looking into Kraken2 and its databases built from Refseq (which should contain all parasite sequences annotated). Have also noticed this DB available from EuPathDB:
I personally think that if you're looking into parasites specifically your better option is to have a protein dataset and diamond blastx your DNA against it. I think that it will yield better results than using k-mer based methods.
Using CAT (which in turn uses diamond), and then parsing its contig2classification output files by taxonomy may be helpful; everything with a taxonomy tree that descends towards parasites could then be eliminated.
Could you explain why? Do parasites tend to have higher mutation rates?
Me today is trying to understand what me six months ago meant. I think that it's true for everything, not just parasites.