Consensus Sequence
1
4
Entering edit mode
14.2 years ago
User 5217 ▴ 40

Hello, I have below 8 sequences and I would like to calculate a consensus sequence from them.

sequences = [['C', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'T', 'T', 'C', 'T', 'G', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'A', 'G'],
             ['C', 'T', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'G', 'T', 'C'],
             ['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['C', 'C', 'T', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'G', 'T'],
             ['C', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'T', 'G']
            ]

for i in range(len(sequences[1])):
  alignment = ""
  for j in range(len(sequences)):
    alignment += sequences[j][i]
  print alignment
  print alignment.count("A")
  print alignment.count("C")
  print alignment.count("G")
  print alignment.count("T")
  print "----------"

The above code calculates to each position how often a base occurs (Position Frequency Matrix). I have found the following rules ( http://www.cisred.org/content/methods/help/pfm ) to calculate the consensus sequence, but unfortunataly I do not quite understand it yet to complete the implementation of consensus sequence.

Thank you in advance.

Best regards,

biopython python consensus • 8.4k views
ADD COMMENT
1
Entering edit mode

You should look at Brad's suggestion using Biopython in this question: Create Consensus Sequences For Sequence Pairs Within A Multiple Alignment?

ADD REPLY
0
Entering edit mode

Notes: If you want the length of the first sequence then you should use len(sequences[0]) instead of 1. Without modifying the rest of the code, the sequences could be in string format "CCCATTGTTCTC". Cheers

ADD REPLY
8
Entering edit mode
14.2 years ago
brentp 24k

Check out motility which does exactly that:

import motility
sequences = [['C', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'T', 'T', 'C', 'T', 'G', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'A', 'G'],
             ['C', 'T', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'G', 'T', 'C'],
             ['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['C', 'C', 'T', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'G', 'T'],
             ['C', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'T', 'G']
            ]

pwm = motility.make_pwm(sequences)
print pwm.generate_sites_over(pwm.max_score())

prints

('CCCATTGTTCTC', 'TCCATTGTTCTC')
ADD COMMENT

Login before adding your answer.

Traffic: 2006 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6