Entering edit mode
4.7 years ago
jbrody11
•
0
I am doing a practice exam question shown here:
I understand how to do the frequency matrix but I am unsure about the sequence logo formula calculations. Should my information content maximum be 2 bits? or could it be up to 4 or even 8?
Yes, it is a maximum of 2 bit.
Okay thanks, but I still see an issue with the logo formula they give. I understand the first column of logo will have an A to the height of 2 bits, but my calculations for other columns, for instance the second column are ending up with bit values like 3.295 which is above the maximum. Would you kindly look at how the second column's bit value is calculated? The frequencies are A=0.125 C=0.25 G=0.625 T=0
Maximum information content (MIC) in logo representations is log2(N), where
N
is the number of unique residue types. That means MIC is 2 for nucleic acids, 4.321928095 for proteins.Okay thanks, I'm still confused as to the formula they provided for the sequence logo. For instance, in the second column, G has a frequency of 0.625, therefore I calculate its information content to be: 2 - (0.625 x log2(0.625)) but this answer is 2.42 which is above the maximum IC ... ?
The formula you have is incorrect. The way it was written for you, it should be
2 + ...
.Specifically, the formula is
where
N
is the number of unique residue types,H
is Shannon's uncertainty ande
is a small-number correction. SinceH
itself is a negative of the sum (H = - sigma ( Fbi * log2(Fbi))
), it essentially becomes2 + ...
if we ignoree
as was done in your formula above.See here for and here details.
Okay thanks, thats interesting because that question is from a previous year university exam so I'm surprised they gave the wrong formula. Just one last question please? So in that column if I use the correct formula, I get G=1.58 C=1.5 A=1.625 but now I have to multiply each of these by their frequencies to get the actual information content heights. So G will now be 0.99, C will be 0.375 and A will be 0.2 Therefore my logo in that column at the bottom will look like a small A to the height of 0.2 bits, then C to a height of 0.575 (0.375 + 0.2) and G to a height of 1.565 bits (0.99 + 0.375 + 0.2) Am I correct with this?