Why does weblogo http://weblogo.berkeley.edu/logo.cgi generate sequence logos where the scale exceeds 2? if a particular position was completely determined by one base it would only be two bits... what does it mean when it's higher?
Why does weblogo http://weblogo.berkeley.edu/logo.cgi generate sequence logos where the scale exceeds 2? if a particular position was completely determined by one base it would only be two bits... what does it mean when it's higher?
The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. (this is what you see in the abstract of the weblogo paper)
In other words, The bits increases if you have more number of sequences to generate the seq logo. That is a measure to tell you how many sequences did you use to generate that logo.
I am not sure how it works in weblogo... You can do a bit more in R. Here is a piece of code that might be useful.
to install the package
source("http://bioconductor.org/biocLite.R")
biocLite("seqLogo","Biostrings")
#loadspackages
library(seqLogo)
library(Biostrings)
sigH10 -> scan("sigH10.txt",what="character") #read the file containing logo
sigH10_PWM->consensusMatrix(unique(sigH10),as.prob=TRUE) # calculate the probability matrix
seqLogo(sigH10_PWM, ic.scale=TRUE) # generate seqlogo.. play a bit more to get a nice logo :)
you can read the documentation here to get an idea. Hope this helps!
In case that you do not know where the motif is, you can use MEME (http://meme.nbcr.net/meme/) to find it and generate LOGO.
The number of bits of information could also be determined on the basis of whether you are working with a protein's sequence of amino acids, DNA or RNA, or another sequence that uses a different BioPython alphabet. You don't specify that in your question, but generally speaking that could also be an issue as far as getting more than 2 bits of entropy.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
How can I normalize it so that it's always out of 2 bits?
I have updated my answer above!.. May be that helps!