I have some logoplots using http://weblogo.threeplusone.com/ and I have also seen these plots in various papers. I'm a cell biologist so unfortunately I don't understand much of the "information content" or entropy when reading about what bits really is. However, most people seem to still prefer it over probability. So, could anyone tell me (or point me to an easy to understand explanation) why it is better? What benefits am I missing if I go with probability? Is is still "ok" to go with probability or is it a big no-no amongst bioinformaticans?
- I think it is very easy to understand conservation with probability. The height is simply all the input sequences, so if an amino axis is half of the total height it's simply present in half of the sequences
- it is very easy to compare conservation between different residues since the total height of all positions are the same
- show a probability logo to a non-bioinformatican and they will get it with no prior knowledge. Show a bit logo and they will most likely be confused (not only what a "bit" is, but also why the total height is different between residues). Today my PI asked me if I had photoshopped away some amino acids since the residues didn't add up to the same height :(
I don't see how the use of the bit is pretentious, it wasn't chosen to sound "cool", it was chosen because sequence logos and so on build directly off of information theory.