Question

RNA-seq gene expression analysis using 0-counts

1

Entering edit mode

10.0 years ago

johntlovell ▴ 10

Hi Folks.

I am conducting a differential gene expression analysis using RNA-seq. My experimental design is blocked and repeated, so I need to fit mixed effects models and cannot make use of standard DGE packages such as DESeq, edgeR etc. This is not a problem when the count data is generalizable to the negative biominal (poisson etc.) distribution; however, for many of the genes, I have highly 0-inflated, or binary distributed count data. For example, for many of the genes, there are 0 counts for one parent and >5 counts for the other parent. Please advise on the best way to analyze genes that behave this way.

Thanks, John

RNA-Seq • 3.5k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by johntlovell ▴ 10

0

Entering edit mode

Are you sure you actually need to use a mixed-effect model? Given that DESeq2/edgeR/etc. use shrinkage, a mixed-effect model is unlikely to benefit you.
Have a look at limma's duplicateCorrelation() function.

ADD REPLY • link 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks Devon. This comment has come up in many of the posts that I have read.

For me, when an experiment is designed with blocking and replication within the individual, the individual and experimental blocking must be analyzed as random effects. This is a pretty standard quantitative genetics design. Furthermore, we have a ton of replication within the experimental factors we are testing among, so I am not convinced that shrinkage is a particularly good method to estimate within group variances.

Anyways, even if I did use fixed effects, I am still unsure about the best way to analyze these highly 0-inflated and binary gene expression phenotypes. Thanks again.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by johntlovell ▴ 10

0

Entering edit mode

Certainly if you were to compare a straight GLM and a GLMM on your dataset then the GLMM would work better...but of course a GLMM is just doing shrinkage in a different way than DESeq2 et al., which aren't straight GLMs.

Regarding the zeroes, it depends a bit on exactly what you mean by zero inflated and where the problem is. If the case is that you have absolutely 0 expression in all but one sample, then that can be problematic. I suppose how to deal with that depends on whether you find those cases biologically interesting. For most people they wouldn't be, but I can think of counter examples (e.g., single-cell sequencing).

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Devon Ryan 104k