Entering edit mode
5.7 years ago
jyu429
▴
120
Hi,
Is there a tool for not only enumerating the counts of kmers (like jellyfish) but also will list their positions? I know its much more exhausting memory-wise but I'm looking for the best way to do this, even if a tool doesn't exist currently.
Thanks!
Take a look at Finding 16 mer not present in GRCh38. In this a suggestion was to use
bowtie
to align the kmers against the genome. I would do the alignment and then filter for matches with 100% sequence identity. It might help to set gap opening and mismatch penalties to like 10000 to only retain perfect matches.Is that really faster than for example implementing a search trie?
How large your Kmers?, all combinations?, all occurrences? I used to code some scripts in Perl for kmer counting (8-12 kmers) with their position for cis-regulatory elements in some plant genomes, so it is not hard to do, even on the 2 GB RAM machine I had.