Question

Requirement of constant variance in limma-voom

1

Entering edit mode

5.1 years ago

Aspire ▴ 390

In the voom article it is stated that log-cpm of RNA-Seq data can be treated as analogous to values from a microarray experiment, with the difference that values from RNA-Seq data do not have constant variances. I understand constant variances here to mean that as the mean changes the variance does not.

1) Why is it required that the variance is constant across the mean?

2) Why assuming the variance to be constant works in microarrays?

voom • 1.8k views

ADD COMMENT • link updated 5.1 years ago by i.sudbery 22k • written 5.1 years ago by Aspire ▴ 390

score 3 · Answer 1 · 2020-06-17

3

Entering edit mode

5.1 years ago

i.sudbery 22k

What is it required that variance is constant across the mean

Why understanding is that limma is based on linear models. A classic linear model requires no mean/variance trend because it means the regression is more sensitive to changes in the high-variance points, than the low ones, and also makes the standard error estimates biased.

Why is it okay to assume the variance is constant in microarrays

The data in a microarray is fluorescent intensity data. It is fully continuous and has a log-normal distribution. Sequencing data is based on counts and is discrete not continuous strictly speaking you would expect it to follow a binomial distribution and be well estimated by a Poisson distribution, and look approximately log-normal at high enough read-depths. But it turns out it is over-dispersed - that is the variance is higher than you would expect for a Poisson distributed variable for a given mean.

ADD COMMENT • link 5.1 years ago by i.sudbery 22k

0

Entering edit mode

I'm interested in a clustering analysis, not in differential expression. Since the constant variance requirement is needed (only) for the linear modeling part, is voom needed at all for clustering?

ADD REPLY • link 5.1 years ago by Aspire ▴ 390

0

Entering edit mode

Yes, because clustering is driven by high variance features and so you need to stabilize the variance. In DE the worry is that different subjects are in different variance regimes, in clustering the worry is that different genes are in different variance regimes. We tend to use rlog from the DESeq2 package before doing things like clustering, I don't know if voom would be equally effective, but my guess is yes.

ADD REPLY • link 5.1 years ago by i.sudbery 22k

0

Entering edit mode

This sounds true in theory, but Gordon Smith says here that the precision weights generated by voom cannot be easily combined with the gene counts.

I always recommend cpm(counts, log=TRUE, prior.count=3) for the purpose of other down-stream analyses, because the voom quantities cannot be summarized in single combined quantity.

ADD REPLY • link 5.1 years ago by Aspire ▴ 390

1

Entering edit mode

In which case I recommend using rlog or vst then.

rlog is a bit like cpm(counts, log=TRUE, prior.count=3) except the prior count is calculated in a principled way separately for each gene.

ADD REPLY • link 5.1 years ago by i.sudbery 22k