How to make a greedy algorithm program in python to find motif in a given DNA sequence?
0
0
Entering edit mode
8.6 years ago
Kevin_Smith ▴ 10

In the sequence file, each line contains a single DNA sequence. I will like to try just a round of greedy algorithm starting from the first sequence and the first position. The goal is to find the motif length of 7 from the given DNA sequences in the file.

The input for the program should be the sequence file name.

The output should include:

  • the sites or kmers (each of them is a 7mer sequence from a sequence)
  • the PPM (position probability matrix) for ATCG
  • and the total information content of the PPM (for 7-mers).

For example :

Input

TCTGAGCTTGCGTTATTTTTAGACC

GTTTGACGGGAACCCGACGCCTATA

output

kmers:

TTCCT TTGCG

PPM:

A 0.091 0.091 0.091 0.091 0.091

T 0.727 0.727 0.091 0.091 0.545

total information content: 22.47

Who can help me with a python script. Thank you very much!

gene sequence alignment blast next-gen • 3.8k views
ADD COMMENT

Login before adding your answer.

Traffic: 2189 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6