Requirement of constant variance in limma-voom
1
1
Entering edit mode
4.5 years ago
Aspire ▴ 370

In the voom article it is stated that log-cpm of RNA-Seq data can be treated as analogous to values from a microarray experiment, with the difference that values from RNA-Seq data do not have constant variances. I understand constant variances here to mean that as the mean changes the variance does not.

1) Why is it required that the variance is constant across the mean?

2) Why assuming the variance to be constant works in microarrays?

voom • 1.5k views
ADD COMMENT
3
Entering edit mode
4.5 years ago

What is it required that variance is constant across the mean

Why understanding is that limma is based on linear models. A classic linear model requires no mean/variance trend because it means the regression is more sensitive to changes in the high-variance points, than the low ones, and also makes the standard error estimates biased.

Why is it okay to assume the variance is constant in microarrays

The data in a microarray is fluorescent intensity data. It is fully continuous and has a log-normal distribution. Sequencing data is based on counts and is discrete not continuous strictly speaking you would expect it to follow a binomial distribution and be well estimated by a Poisson distribution, and look approximately log-normal at high enough read-depths. But it turns out it is over-dispersed - that is the variance is higher than you would expect for a Poisson distributed variable for a given mean.

ADD COMMENT
0
Entering edit mode

I'm interested in a clustering analysis, not in differential expression. Since the constant variance requirement is needed (only) for the linear modeling part, is voom needed at all for clustering?

ADD REPLY
0
Entering edit mode

Yes, because clustering is driven by high variance features and so you need to stabilize the variance. In DE the worry is that different subjects are in different variance regimes, in clustering the worry is that different genes are in different variance regimes. We tend to use rlog from the DESeq2 package before doing things like clustering, I don't know if voom would be equally effective, but my guess is yes.

ADD REPLY
0
Entering edit mode

This sounds true in theory, but Gordon Smith says here that the precision weights generated by voom cannot be easily combined with the gene counts.

I always recommend cpm(counts, log=TRUE, prior.count=3) for the purpose of other down-stream analyses, because the voom quantities cannot be summarized in single combined quantity.

ADD REPLY
1
Entering edit mode

In which case I recommend using rlog or vst then.

rlog is a bit like cpm(counts, log=TRUE, prior.count=3) except the prior count is calculated in a principled way separately for each gene.

ADD REPLY

Login before adding your answer.

Traffic: 1653 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6