Rapid K-mer search
2
1
Entering edit mode
3 months ago
Anjan ▴ 840

Here is my problem. I have a reference file with all 32-mers found in the human transcriptome. I would have a search file of 32-mers from short read sequencing data. I would like to determine if any of the 32-mers in the search file are present in the reference file. The search should be of the approximate k-mer search type, as I expect the 32-mers from sequence data to have errors and in some cases variants. Do you know of a tool that can solve my problem? Thank you

search k-mer • 469 views
ADD COMMENT
1
Entering edit mode

NCBI uses STAT to search SRA submissions to generate taxonomy checks (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02490-0 )

ADD REPLY
2
Entering edit mode
3 months ago

Since you've already got both ref and query k-mers sets, you can only use technologies such as simhash which support approximate k-mer match.

ref

ADD COMMENT
0
Entering edit mode
3 months ago

Are you looking for a tool or a library? I think this paper and some of the references would be a good start for the fundamentals, and HyperGen looks like a fancy new library. As a ready to use tool, Sylph may be what you are looking for - but you need to provide Fasta/FastQ input and not precomputed k-mers.

ADD COMMENT

Login before adding your answer.

Traffic: 1931 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6