edger differential expression analysis error
1
0
Entering edit mode
5.8 years ago
Janey ▴ 30

Hi

I use edger with no replicate methods for differential expression analysis. This method works well for some of my samples, and for some others i get this error:

library (edgeR)

x <- read.delim ("quant.txt",row.names="Name")
group<-factor (c("T","C"))
design <- model.matrix(~group)
y <- DGEList (counts=x,group=group)
y <- calcNormFactors(y)
keep <- rowSums(cpm(y)>0.1) >= 2
y <- y[keep, , keep.lib.sizes=FALSE]
y1 <- y
y1$samples$group <- 1
y0 <- estimateCommonDisp(y1[2,], trend="none", tagwise=FALSE)

There were 50 or more warnings (use warnings() to see the first 50)

If you help me with this problem, i will be very grateful.

R • 3.2k views
ADD COMMENT
1
Entering edit mode

These are warnings not errors. Related post at SO: https://stackoverflow.com/q/37826523/680068

ADD REPLY
0
Entering edit mode

Thank you very much

My problem was solved by changing the sort of my data.

ADD REPLY
1
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you for pointing this out.

ADD REPLY
0
Entering edit mode

Can you include which warnings have been issued?

ADD REPLY
0
Entering edit mode

I'm new to working with R and I'm not very familiar with their codes.

this error:

There were 50 or more warnings (use warnings() to see the first 50)

ADD REPLY
0
Entering edit mode

Okay...so did you try typing "warnings()"?

ADD REPLY
0
Entering edit mode

Yes.

warnings()

Warning messages:

1: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
2: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
3: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
4: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
5: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
6: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
7: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
8: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
9: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
10: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
11: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
12: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
13: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
14: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
15: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
16: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
17: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
18: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
19: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
20: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
21: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
22: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
23: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
24: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
25: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
26: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
27: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
28: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
29: In condLogLikDerSize(y, r,
ADD REPLY
0
Entering edit mode

Hello Janey!

We believe that this post does not fit the main topic of this site.

Received an expert's opinion at Bioconductor.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLY
0
Entering edit mode

I respectfully disagree - it is good to provide links to sources other than Biostars, and this is a relevant topic for Biostars discussions.

Also, even if there is a solution, it is possible that others may want to provide commentary from their experiences (and I also don't think everything was confirmed to be OK by the person who posted the question).

So, I re-opened this thread.

ADD REPLY
0
Entering edit mode

You are free to do so. Still I think discussions should not be spread over two different communities, and BioC is more appropriate as the maintainers of the package are more active in that community.

ADD REPLY
0
Entering edit mode

I am certainly more active in Biostars than BioC (although I have a separate discussion group for COHCAP). However, that example is admittedly a little weird because the discussion for the Bioconductor implementation is on SourceForge (although people could potentially ask comments on the GitHub branch, with the limitation that they wouldn't see all the earlier questions).

So, for a given program, I think it is helpful to have one place to see questions directly answered by the developer.

However, I am trying to get a better feel for what are the best long-term support systems, and I don't think people should be excluded from asking questions related to Bioinformatics Analysis on 3rd party discussion forums.

For example, on Biostars, perhaps somebody asking a question about edgeR may get some related feedback about how to do something similar in DESeq2. Or, going back to the COHCAP example, I think people should test out different programs for each project (if it is BS-Seq, I would recommend starting with both COHCAP and methylKit). So, I think there can also be value in getting feedback from users that aren't the developer.

ADD REPLY
0
Entering edit mode

I am certainly more active in Biostars than BioC

I was not referring to you.

(...) and I don't think people should be excluded from asking questions related to Bioinformatics Analysis on 3rd party discussion forums.

None of my post argues against that. Here it is all about cross-posting which spreads informaton over two (or more) communities.

So, I think there can also be value in getting feedback from users that aren't the developer.

Again true, and again I was not arguing against that, but against cross-posting.

ADD REPLY
0
Entering edit mode

Ok - I think I understand the concern a little better now:

I agree there can be an issue with cross-posting, but I am not sure if closing the post is necessary the best solution.

More specifically, it is my opinion that labs need to plan ahead for a considerable amount of time to spend on analysis, and they need to gradually expand their skill sets. So, if somebody is in the middle of a project without enough prior preparation, then I think that is an issue. If somebody is cross-posting a lot because they feel very uncomfortable with analysis of their data, I think that is an issue.

If the response time for the developer is slow, that can also be an issue (but that is part of what I am trying to work out, and something for which I unfortunately don't have an immediate solution that I can spell out precisely).

If somebody has one Biostars post and they create an almost identical Biostars post, I think closing the 2nd post with a link to the first post is OK. However, if the set of people giving feedback on Biostars is different than BioC, I think issuing a warning without closing the question may be a better solution.

Sorry to be nit-picky - I certainly appreciate all of your help with moderating the Biostars discussions :)

ADD REPLY
2
Entering edit mode
5.8 years ago
Gordon Smyth ★ 7.6k

The basic problem is that you have asked estimateCommonDisp to estimate the negative binomial dispersion from just one gene, by passing just the 2nd row of data y1[2,] instead of the whole data object. Why would you do that?

Since y1 has only 2 columns, you are asking estimateCommonDisp to estimate the negative binomial dispersion from a single row of two numbers. That's a lot to ask.

Even that would usually run ok without any warnings or errors, so I am guessing that the 2nd row of y1 contains two counts that add up to be less than or equal to 5. By default, estimateCommonDisp filters out genes with total count less than or equal to 5. So estimateCommonDisp has filtered out the one row of data that was input, meaning that it has tried to estimate the dispersion from a matrix with zero rows, and that has lead to the warning you see. I have added a check to the function so that this problem will in future generate a more informative error message.

In summary, there are several errors in your code:

  1. You've subsetted y1 for no reason, throwing away almost all the data.
  2. Your filtering with rowSums and cpm is not stringent enough, keeping genes for which all the counts are too small to do any reasonable inference with. Please use the edgeR function filterByExpr instead.
  3. The estimateCommonDisp function doesn't have arguments trend or tagwise.

More correct code for what you are trying to do is given here: https://support.bioconductor.org/p/117407/#117418

Please don't cross-post the same question at the same time to multiple lists. I've had to answer your question twice, once here and again on Bioconductor.

ADD COMMENT

Login before adding your answer.

Traffic: 1631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6