Let's say you had 10 draft-genome assemblies from different sources with 100 contigs all together from a particular genus.
Are there any tools that allow you to use this "database" of assemblies to then grab any reads that have even a remotely similar k-mer usage to the "database"?
I know about kneaddata but that is mapping to a very specific reference sequence, I'm looking for a way to extract reads that have similar k-mer usage.
Is there a tool that I can use to do this?
You can use
mash
(https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0997-x ),sourmash
( https://sourmash.readthedocs.io/en/latest/ ) orbbsketch.sh
from BBMap suite (https://sourceforge.net/projects/bbmap/ ) can help you with the classification but I am not sure if they have functionality to extract those reads.https://github.com/will-rowe/hulk/ ?