Here is my problem. I have a reference file with all 32-mers found in the human transcriptome. I would have a search file of 32-mers from short read sequencing data. I would like to determine if any of the 32-mers in the search file are present in the reference file. The search should be of the approximate k-mer search type, as I expect the 32-mers from sequence data to have errors and in some cases variants. Do you know of a tool that can solve my problem? Thank you
NCBI uses
STAT
to search SRA submissions to generate taxonomy checks (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02490-0 )