Question

python function to kmer with windown size

0

Entering edit mode

5.8 years ago

flogin ▴ 280

Hello guys, I was reading about build-in functions in python to work with kmer, and I found this one:

mySeq = 'AAATTAAAGACAAAATCCCAGAATGCCCG'

def getKmers(sequence, size):
    return [sequence[x:x+size].upper() for x in range(len(sequence) - size + 1)]

Which returns:

['AAATTA', 'AATTAA', 'ATTAAA', 'TTAAAG', 'TAAAGA', 'AAAGAC', 'AAGACA', 'AGACAA', 'GACAAA', 'ACAAAA', 'CAAAAT', 'AAAATC', 'AAATCC', 'AATCCC', 'ATCCCA', 'TCCCAG', 'CCCAGA', 'CCAGAA', 'CAGAAT', 'AGAATG', 'GAATGC', 'AATGCC', 'ATGCCC', 'TGCCCG']

As we can see, the kmers are created in a windown range equals to 1, I'm thinking how I can define a windown range major than 1, for example, 3, to generate kmers in that form:

['AAATTA', 'TTAAAG', 'AAGACA', 'ACAAAA', 'AAATCC'...']

Can anyone help?

python kmer biopython fasta • 3.9k views

ADD COMMENT • link updated 2.7 years ago by mattmoore_91 • 0 • written 5.8 years ago by flogin ▴ 280

score 3 · Answer 1 · 2020-01-09

3

Entering edit mode

5.8 years ago

cschu181 ★ 2.8k

def getKmers(sequence, size, step):    
  return [sequence[x:x+size] for x in range(0, len(sequence) - size, step)]

You should probably write it as a generator, though:

def getKmers(sequence, size, step):    
  for x in range(0, len(sequence) - size, step):
    yield sequence[x:x+size]

ADD COMMENT • link 5.8 years ago by cschu181 ★ 2.8k

0

Entering edit mode

thanks cschu181, it's exactly it !

ADD REPLY • link 5.8 years ago by flogin ▴ 280

score 0 · Answer 2 · 2023-02-13

0

Entering edit mode

2.7 years ago

mattmoore_91 • 0

Consider using this fast parser:

https://github.com/moorembioinfo/KmerAperture/tree/main/parser

ADD COMMENT • link 2.7 years ago by mattmoore_91 • 0