Software to identify overrepresented k-mers in sequencing data
1
0
Entering edit mode
8.2 years ago
abascalfederico ★ 1.2k

Hi all,

I need to identify overrepresented k-mers in sequencing data. Ideally, I would need k-mers of lengths between 7 and 20 (I am searching for some sequencing adaptors remnants).

Anyone knows of a program able to do this?

Thanks! Federico

sequencing k-mer • 2.5k views
ADD COMMENT
2
Entering edit mode
8.2 years ago

fastqc with option kmer: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/11%20Kmer%20Content.html

   -k --kmers       Specifies the length of Kmer to look for in the Kmer content
                    module. Specified Kmer length must be between 2 and 10. Default
                    length is 7 if not specified.
ADD COMMENT
0
Entering edit mode

Thanks Pierre! That may be helpful but I would like to be able to search for longer kmers (up to 20 bps)

ADD REPLY
2
Entering edit mode

fastqc is a shell script, change the following lines:

if ($kmer_size) {
    unless ($kmer_size =~ /^\d+$/) {
        die "Kmer size '$kmer_size' was not a number";
    }
    #### CHANGE 10 to WHATEVER...
    if ($kmer_size < 2 or $kmer_size > 10) {
        die "Kmer size must be in the range 2-10";
    }

    push @java_args,"-Dfastqc.kmer_size=$kmer_size";
}

use at your own risk.

ADD REPLY
0
Entering edit mode

Minimum at 2 obviously makes sense. Any idea why they hard-coded the maximum at 10?

ADD REPLY
1
Entering edit mode

because 10 is not 'too much' in memory: there is potentialy 4^10= 10,48,576 unique keys in the map. k=20 would be : 1,099,511,627,776 ==> OUT OF MEMORY.

ADD REPLY

Login before adding your answer.

Traffic: 2350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6