Entering edit mode
4.9 years ago
flogin
▴
280
Hello guys, I was reading about build-in functions in python to work with kmer, and I found this one:
mySeq = 'AAATTAAAGACAAAATCCCAGAATGCCCG'
def getKmers(sequence, size):
return [sequence[x:x+size].upper() for x in range(len(sequence) - size + 1)]
Which returns:
['AAATTA', 'AATTAA', 'ATTAAA', 'TTAAAG', 'TAAAGA', 'AAAGAC', 'AAGACA', 'AGACAA', 'GACAAA', 'ACAAAA', 'CAAAAT', 'AAAATC', 'AAATCC', 'AATCCC', 'ATCCCA', 'TCCCAG', 'CCCAGA', 'CCAGAA', 'CAGAAT', 'AGAATG', 'GAATGC', 'AATGCC', 'ATGCCC', 'TGCCCG']
As we can see, the kmers are created in a windown range equals to 1, I'm thinking how I can define a windown range major than 1, for example, 3, to generate kmers in that form:
['AAATTA', 'TTAAAG', 'AAGACA', 'ACAAAA', 'AAATCC'...']
Can anyone help?
thanks cschu181, it's exactly it !