Entering edit mode
8.2 years ago
abascalfederico
★
1.2k
Hi all,
I need to identify overrepresented k-mers in sequencing data. Ideally, I would need k-mers of lengths between 7 and 20 (I am searching for some sequencing adaptors remnants).
Anyone knows of a program able to do this?
Thanks! Federico
Thanks Pierre! That may be helpful but I would like to be able to search for longer kmers (up to 20 bps)
fastqc is a shell script, change the following lines:
use at your own risk.
Minimum at 2 obviously makes sense. Any idea why they hard-coded the maximum at 10?
because 10 is not 'too much' in memory: there is potentialy 4^10= 10,48,576 unique keys in the map. k=20 would be : 1,099,511,627,776 ==> OUT OF MEMORY.