How to make a greedy algorithm program in python to find motif in a given DNA sequence?

0

Entering edit mode

8.6 years ago

Kevin_Smith ▴ 10

In the sequence file, each line contains a single DNA sequence. I will like to try just a round of greedy algorithm starting from the first sequence and the first position. The goal is to find the motif length of 7 from the given DNA sequences in the file.

The input for the program should be the sequence file name.

The output should include:

the sites or kmers (each of them is a 7mer sequence from a sequence)
the PPM (position probability matrix) for ATCG
and the total information content of the PPM (for 7-mers).

For example :

Input

TCTGAGCTTGCGTTATTTTTAGACC

GTTTGACGGGAACCCGACGCCTATA

output

kmers:

TTCCT TTGCG

PPM:

A 0.091 0.091 0.091 0.091 0.091

T 0.727 0.727 0.091 0.091 0.545

total information content: 22.47

Who can help me with a python script. Thank you very much!

gene sequence alignment blast next-gen • 3.8k views

ADD COMMENT • link 8.6 years ago by Kevin_Smith ▴ 10

Login before adding your answer.