Question

Why Does Plotting Log Fold Change Threshold Against Number Of Differentially Expressed Genes Result In A Sigmoid Curve?

0

Entering edit mode

11.6 years ago

pollyD ▴ 30

Hi all,

I'm working with RNA microarrays, and I'm a newbie to the field. I processed the results with the limma package for Bioconductor.

I'm trying to decide which fold change threshold I should use. I plotted logFC threshold against the number of differentially expressed genes. I have three conditions, so I got three curves **plot** here which look very much like reverse sigmoid curves. I've asked colleagues and was told that good quality data usually show such curves but it doesn't imply anything.

So, could use please explain why do they look this way or where could I find any explanation? Then, is it possible to use this data to define log fold change threshold? Do plateau, transition and exponential phase have any biological meaning?

Thanks in advance!

microarray rna bioconductor • 5.1k views

ADD COMMENT • link updated 11.2 years ago by Biostar 20 • written 11.6 years ago by pollyD ▴ 30

2

Entering edit mode

The curves look as expected IMHO, because you are ignoring the sign of the log-fold change. Most genes marginally change expression thats why you see a plateau between 0 and 1. If you plot sorted log-fold changes with sign, you'll see a curve like this http://en.wikipedia.org/wiki/File:Logistic-curve.svg.Or, a gaussian curve if you plot the histogram.You can choose typical joint fold change and p-value cut off, say abs(FC)= 1.5X and p-value 0.05

ADD REPLY • link 11.6 years ago by Woa ★ 2.9k

0

Entering edit mode

Thanks for the idea, but it doesn't explain what I have. If I do as you suggest, I will get another representation of the volcano plot, won't I?

I might have misled you with the axis label (sorry!). It'a logFC threshold. So, the output is the number of genes which are regarded as differentially expressed at this threshold, a kind of cumulative variable. If I plot upregulated and downregulated genes separately, I get similar curves.

ADD REPLY • link 11.6 years ago by pollyD ▴ 30

0

Entering edit mode

For me its difficult to tell WHY the #of differential proteins vs. a cut-off FC gives such a curve, but my guess is 'central part' of the log-FC histogram comes from a Gaussian model and the ' tails' , which contain differential proteins follow some kind of heavy tailed distribution. People tried to model gene/protein expression using mixture models viz. Gaussian combined with Generalized Pareto. http://www.plosone.org/article/info:doi/10.1371/journal.pone.0007454

ADD REPLY • link 11.6 years ago by Woa ★ 2.9k

0

Entering edit mode

Well, I definitely have to learn more about the methods I'm using.

Thanks for the link!

ADD REPLY • link 11.6 years ago by pollyD ▴ 30

score 0 · Answer 1 · 2013-04-30

0

Entering edit mode

11.6 years ago

Woa ★ 2.9k

BTW, how you are defining "differentially expressed" at a given threshold? A gaussian data would produce similar curve, however the shoulder is less broad

rm(list=ls())
my.data <- rnorm(5000,0,0.3)*4.0
summary (my.data)
hist(my.data)
thres <-seq(0.5,4,0.5)
my.sum <-rep(NA,length(thres))

for ( i in 1:length(thres) ){
    my.sum[i] <- length(my.data[abs(my.data) > thres[i]])

}

plot(thres,my.sum,pch=20,cex=2.0,col="hotpink")
lines(thres,my.sum,,col="blue",lty=4,lwd=1)

ADD COMMENT • link 11.6 years ago by Woa ★ 2.9k

0

Entering edit mode

I define differentially expressed genes as those having absolute log FC > 0.5 and p < 0.001 (in this case). p value comes from the the linear model and were subjected to Benjamini-Hochberg correction.

ADD REPLY • link 11.6 years ago by pollyD ▴ 30