Most frequent substring of a given length
2
0
Entering edit mode
9.1 years ago

Is there anything in Biopython which could give the most frequently occurring substring of a given length from a sequence read?

biopython • 3.8k views
ADD COMMENT
2
Entering edit mode
9.1 years ago
venu 7.1k

If language doesn't matter, you can use this bash (includes a perl script) script. I did not update this from a long time but it works pretty well. You can give the k-mer length of your interest. Output will be as follows or you can modify according to your needs.

TAACCCTAAC   23
AACCCTAACC   21
ACCCTAACCC   20 
CCCTAACCCT   20
CCTAACCCTA   20
CTAACCCTAA   18
TAACCCTAAC   18
-----
------
ADD COMMENT
1
Entering edit mode
9.1 years ago

How about a k-mer counter, like khmer (not Biopython, but C++ with Python wrapper)?

ADD COMMENT

Login before adding your answer.

Traffic: 1937 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6