Sequence Logos With Weblogo Scale
3
1
Entering edit mode
11.6 years ago
user ▴ 950

Why does weblogo http://weblogo.berkeley.edu/logo.cgi generate sequence logos where the scale exceeds 2? if a particular position was completely determined by one base it would only be two bits... what does it mean when it's higher?

sequence alignment motif motif • 5.3k views
ADD COMMENT
5
Entering edit mode
11.6 years ago
k.nirmalraman ★ 1.1k

The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. (this is what you see in the abstract of the weblogo paper)

In other words, The bits increases if you have more number of sequences to generate the seq logo. That is a measure to tell you how many sequences did you use to generate that logo.

I am not sure how it works in weblogo... You can do a bit more in R. Here is a piece of code that might be useful.

to install the package
source("http://bioconductor.org/biocLite.R")
biocLite("seqLogo","Biostrings")
#loadspackages
library(seqLogo)
library(Biostrings)
sigH10 -> scan("sigH10.txt",what="character")   #read the file containing logo
sigH10_PWM->consensusMatrix(unique(sigH10),as.prob=TRUE)  # calculate the probability matrix
seqLogo(sigH10_PWM, ic.scale=TRUE) # generate seqlogo.. play a bit more to get a nice logo :)

you can read the documentation here to get an idea. Hope this helps!

ADD COMMENT
0
Entering edit mode

How can I normalize it so that it's always out of 2 bits?

ADD REPLY
0
Entering edit mode

I have updated my answer above!.. May be that helps!

ADD REPLY
2
Entering edit mode
11.6 years ago

In case that you do not know where the motif is, you can use MEME (http://meme.nbcr.net/meme/) to find it and generate LOGO.

ADD COMMENT
0
Entering edit mode

The sequences I work with are too short for MEME (it needs 8 or higher)

ADD REPLY
1
Entering edit mode
11.6 years ago

The number of bits of information could also be determined on the basis of whether you are working with a protein's sequence of amino acids, DNA or RNA, or another sequence that uses a different BioPython alphabet. You don't specify that in your question, but generally speaking that could also be an issue as far as getting more than 2 bits of entropy.

ADD COMMENT
0
Entering edit mode

good point. to clarify, I am just working with A,T,G,C so I want a seqlogo that has an upper bound of 2 bits... and it's not clear to me why weblogo gives 4 bits when it recognizes the alphabet A,T,G,C

ADD REPLY

Login before adding your answer.

Traffic: 1423 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6