ANOVA for RNA-seq data?
2
2
Entering edit mode
5.0 years ago

Hello

I have seem some platforms such as GEPIA offering ANOVA for differential gene expression analysis. However, as far as I'm concerned, ANOVA compares the averages and assumes equal distribution and variance among samples, which, as far I have been lead to assume, is uncommon for any kind of RNA-seq derived data, especially considering the thousands of possibly expressed genes in the human genome. Is ANOVA really appropriate for differential expression?

anova RNA-Seq • 11k views
ADD COMMENT
3
Entering edit mode
5.0 years ago

Nope. Also the distribution of RNA-seq data is not normal (as an ANOVA also assumes). You should use specifically designed tools such as edgeR, DESeq2 or limma.

ADD COMMENT
0
Entering edit mode

Limma is based on linear model/ANOVA under the hood actually. While raw RNA-seq data is never normal, Limma uses the log-transformed CPM – which are normal enough for ANOVA. So yes, it is possible to analyze RNA-seq data with ANOVA but I agree that it is rather sub-optimal compared to more modern methods such as DESeq2 and edgeR (based on negative binomial modeling on the raw counts).

ADD REPLY
0
Entering edit mode

The issue is not so much "normality" (limma-voom and sleuth don't do negative binomial modeling and they work super well). The issue lies with variance estimation which is why limma does not use t-tests/ANOVA in the traditional sense; it uses those tests but regularizes the variance estimates which is necessary in almost all cases.

Most differential gene expression packages support ANOVA-like comparisons, so just stick with those.

ADD REPLY
0
Entering edit mode

Yes, I agree that Limma is not 'classical' ANOVA but rather an extension of ANOVA. Still, I wanted to add some nuance to the clear-cut answer above stating that ANOVA can not be used for RNA-seq analysis because the counts are not normal.

Also, my understanding is that normality would be an issue without the log transformation of the count data for linear model/ANOVA -based method such as Limma, but not for edgeR or DESeq2 since they assume different properties from the data.

ADD REPLY
0
Entering edit mode

Hi Carlo, Sorry to pop in - I have a sort of a non-classical problem. I am a non-informatic person using a tool called Partek Genomics Suite to provide a collaborator with some extremely rough view of his scRNA-seq data. Partek does not (as far as I can tell) contain edge or DE. The informatics guys might get to this data next week, but my collaborator needs to show his PI just a peek this weekend. I have RNA-seq CPM. Values are, obviously, often zero. Is the following reasonable: 1) make the zero values non-zero (a very small number) 2) log2 transform 3) Run ANOVA I know this is definitely not kosher, but could I at least stack-rank the genes by p-value or fold-change to give a fuzzy picture of the biology?

ADD REPLY
0
Entering edit mode

You should ask a separated question. scRNA-seq is not my specialty so others might provide better answers.

ADD REPLY
0
Entering edit mode

You should ask a separated question. scRNA-seq is not my specialty so others might provide better answers.

ADD REPLY
0
Entering edit mode
3.6 years ago

According to A Beginner’s Guide to Analysis of RNA Sequencing Data (https://www.atsjournals.org/doi/10.1165/rcmb.2017-0430TR) ANOVA is an appropriate analysis for RNA-seq data. However, the review doesn't specify a tool/package to do this analysis. Searching for how to do ANOVA on RNA-seq data brought me to this page.

ADD COMMENT
0
Entering edit mode

Just because a paper does it, doesn't mean it's a correct or statistically sound approach.

Even if you make log normal abundances from your RNA-seq data, using t-tests or ANOVAs to find DE genes is still problematic.

ADD REPLY

Login before adding your answer.

Traffic: 1910 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6