Question

How to extract reads that match k-mer profiles from a collection of sequences?

0

Entering edit mode

6.2 years ago

O.rka ▴ 740

Let's say you had 10 draft-genome assemblies from different sources with 100 contigs all together from a particular genus.

Are there any tools that allow you to use this "database" of assemblies to then grab any reads that have even a remotely similar k-mer usage to the "database"?

I know about kneaddata but that is mapping to a very specific reference sequence, I'm looking for a way to extract reads that have similar k-mer usage.

Is there a tool that I can use to do this?

sequencing • 2.2k views

ADD COMMENT • link updated 6.2 years ago by GenoMax 148k • written 6.2 years ago by O.rka ▴ 740

0

Entering edit mode

You can use mash (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0997-x ), sourmash ( https://sourmash.readthedocs.io/en/latest/ ) or bbsketch.sh from BBMap suite (https://sourceforge.net/projects/bbmap/ ) can help you with the classification but I am not sure if they have functionality to extract those reads.

ADD REPLY • link 6.2 years ago by GenoMax 148k

0

Entering edit mode

https://github.com/will-rowe/hulk/ ?

ADD REPLY • link 6.2 years ago by shenwei356 8.7k

score 1 · Answer 1 · 2018-10-10

1

Entering edit mode

6.2 years ago

GenoMax 148k

cookiecutter (https://github.com/ad3002/Cookiecutter ) seems to do what you need. You will need to test and ascertain.

ADD COMMENT • link 6.2 years ago by GenoMax 148k

0

Entering edit mode

Is that more for extract adapters or can it be extended to entire genome k-mer profiles?

ADD REPLY • link 6.2 years ago by O.rka ▴ 740