Using offsets in edgeR to correct GC content
1
0
Entering edit mode
8.5 years ago
Dawe ▴ 270

Hi all, I'm performing some differential analysis with edgeR and I need to take into account GC content in my regions. From edgeR manual I see I should set GC percentage as "offset" in linear model. Unfortunately there is very little documentation about this, so I need some hint.

Offsets can be passed both to estimateDisp and glmFit functions, but it is not clear how they are handled in the two steps. Setting offset in estimateDisp, i.e.

y = DGEList(…)
y = calcNormFactors(y)
y$offset = gc_content #this is a vector with %GC for every region in y 
y = estimateDisp(y, design)

increases the dispersion values (common, tagwise and trended) in the DGEList object, with negative effect on final pvalues and FDR. Setting it at GLM, i.e.

y = DGEList(…)
y = calcNormFactors(y)
y = estimateDisp(y, design)
fit = glmFit.default(y, design, offset=gc_content, dispersion=y$tagwise.dispersion)

does exactly the opposite: all p-values and FDR become "incredibly" significant, also average logCPM for each region is increased 10-fold (which, I suspect, is the reason why I have lower p-values).

What would be the proper way to include GC information in edgeR?

d

edger gc linear models R • 3.3k views
ADD COMMENT
0
Entering edit mode

Do you have a notable GC bias between some of the samples? If not, don't bother doing this.

ADD REPLY
0
Entering edit mode

Influence of GC in standard differential expression analysis is limited, since you are comparing gene A versus gene A. Even if it's a high GC gene, it's the same for both samples and therefore no need to correct for.

ADD REPLY
1
Entering edit mode
8.2 years ago

Check the EDASeq manual (http://www.bioconductor.org/packages/release/bioc/html/EDASeq.html). You will find a perfect example for your question.

ADD COMMENT

Login before adding your answer.

Traffic: 2119 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6