Question

Error: Every gene contains at least one zero, cannot compute log geometric means

3

Entering edit mode

4.5 years ago

imrankhanbioinfo ▴ 70

Hi there,

When I am running the DESeq pipeline on the dds object and getting this error message.

> dds_res<-DESeq(dds_PvsN)
estimating size factors
Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc,  : 
  every gene contains at least one zero, cannot compute log geometric means
In addition: Warning message:
In class(object) <- "environment" :
  Setting class(x) to "environment" sets attribute to NULL; result will no longer be an S4 object

What strategy should be applied to resolve this conflict?

Thank you very much!

Imran

RNA-Seq DESeq R bioconductor • 29k views

ADD COMMENT • link updated 20 months ago by botloggy ▴ 10 • written 4.5 years ago by imrankhanbioinfo ▴ 70

0

Entering edit mode

Thanks, Kevin and swbarnes2 for your replies,

Kevin your reply was the right solution. This helps us not to remove any sample during this analysis.

So adding a pseudo-count value of '1' to each entry in my data helps to resolve this error.

Best regards: Imran

ADD REPLY • link 4.5 years ago by imrankhanbioinfo ▴ 70

3

Entering edit mode

4.0 years ago

el24 ▴ 40

I encountered the same problem and fixed it by using Kevin's first solution above. I used the below command. I hope it helps anyone who faces this error in the future.

my_data[["RNA"]]@counts <- as.matrix(my_data[["RNA"]]@counts)+1

ADD COMMENT • link 4.0 years ago by el24 ▴ 40

3

Entering edit mode

Is this single-cell data in your case (asking because it looks Seurat-ish to me)? Can you clarify what you aim to do, maybe other methods fit better than DESeq2 here?

ADD REPLY • link 4.0 years ago by ATpoint 86k

1

Entering edit mode

Agreed. If this is single cell, I have a very hard time imagining running into this unless you have a very small number of cells or a handful of "cells" that are pretty much complete junk/empty droplets that should be easily removed by any sensible filtering.

ADD REPLY • link 4.0 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

You are correct, this is Seurat code for scRNA data. I want to get marker genes of scRNAseq from this paper which is pretty standard, but I am not sure why I face this problem. I previously used Wilcoxon rank sum test method, but I want to explore further and see which method works the best for this data, so that's why I use DEseq2 now. I followed the Seurat tutorial and their default filtering parameters and normalization to do so. I saw that Kevin has provided another solution here, I try to see if that works for me.

Please let me know if you have any ideas, I really appreciate it!

ADD REPLY • link 4.0 years ago by el24 ▴ 40

2

Entering edit mode

2.5 years ago

BioInfoBeginner ▴ 50

Adding a very amateur answer for any future users:

Remember to adjust any low gene count filtering criteria after you make changes to your script as I got this error after including a new variable and removing an NA entry in that variable which happened to take my sample size down to 19 whilst I was still filtering for 20 samples...hence why there were zeroes :D ha.

keep <- rowSums(counts(dds) >= 1) >= 20

ADD COMMENT • link 2.5 years ago by BioInfoBeginner ▴ 50

1

Entering edit mode

4.5 years ago

swbarnes2 14k

Check your count matrix. That error can happen if you have a couple of rotten samples. Omitting them might fix things.

ADD COMMENT • link 4.5 years ago by swbarnes2 14k

score 13 · Accepted Answer · 2020-05-26

13

Entering edit mode

4.5 years ago

Kevin Blighe 88k

I encountered this error only once in the past. It is as stated: every gene in your data has at least one zero value, and this creates an issue for the size-factor calculation.

Solutions:

add a pseudo-count value of '1' to your data
use: estimateSizeFactors(dds_PvsN, type = 'iterate')

Kevin

ADD COMMENT • link 4.5 years ago by Kevin Blighe 88k

6

Entering edit mode

I just want to add as a comment that this is a technical solution, but it is unclear what the implications are for the downstream analysis which then depends on the analysis goal. If in normal RNA-seq there is at least one zero per gene that means that (I guess) either samples are notably under-sequenced or there are any other kinds of dropout events that I'd investigate. it is in any case not normal and should probably not be ignored by just adding a pseudocount. If this is single-cell data one might consider single-cell-specific normalization methods such as the deconvolution method in the scran package.

ADD REPLY • link 4.0 years ago by ATpoint 86k

0

Entering edit mode

Does DESeq2 not specify un-normalized counts?

ADD REPLY • link 2.9 years ago by robert.murphy ▴ 90

0

Entering edit mode

It does, and you should use them if possible. The OP here is usually the result of bad samples (or applying DESeq to a single cell dataset with bad cells).