Entering edit mode
14.1 years ago
User 5217
▴
40
Hello, I have below 8 sequences and I would like to calculate a consensus sequence from them.
sequences = [['C', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
['T', 'T', 'T', 'C', 'T', 'G', 'G', 'T', 'T', 'C', 'T', 'C'],
['T', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'A', 'G'],
['C', 'T', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'G', 'T', 'C'],
['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
['C', 'C', 'T', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'G', 'T'],
['C', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'T', 'G']
]
for i in range(len(sequences[1])):
alignment = ""
for j in range(len(sequences)):
alignment += sequences[j][i]
print alignment
print alignment.count("A")
print alignment.count("C")
print alignment.count("G")
print alignment.count("T")
print "----------"
The above code calculates to each position how often a base occurs (Position Frequency Matrix). I have found the following rules ( http://www.cisred.org/content/methods/help/pfm ) to calculate the consensus sequence, but unfortunataly I do not quite understand it yet to complete the implementation of consensus sequence.
Thank you in advance.
Best regards,
You should look at Brad's suggestion using Biopython in this question: Create Consensus Sequences For Sequence Pairs Within A Multiple Alignment?
Notes: If you want the length of the first sequence then you should use
len(sequences[0])
instead of1
. Without modifying the rest of the code, the sequences could be in string format"CCCATTGTTCTC"
. Cheers