I'm searching for a way to find very long k-mers (k ~ 2000). I realize the sequence entropy of 1000 nt is quite low, so I'm looking to search genome wide for long k-mers with gaps allowed - with the minimum threshold that at least 1200 bp be congruent in each discovered motif.
So far, I've tried to do this with glam2 just to prototype. This never converges however - I first split human chromosome 21 into 1 Mb chunks, saving each 1 Mb chunk as a separate line. I then ask glam2 to find local alignments of ~ 2000 bp across these 1 Mb chunks.
glam2 n chr21_chunked1000000.fa -z 10 -a 1200 -b 2000 -w 1500 &
I wonder if there is already a tool out there that is better poised to accommodate long k-mer/motif discovery. Any recommendation/advice greatly appreciated!
-G