DE genes and variability between biological replicates: edgeR
0
0
Entering edit mode
5.0 years ago

Hi guys,

I'm dealing with DE genes resulted from edgeR analysis comparing two experimental groups (n=3 per group) I noticed that some DEGs show high variability within replicates... Do you think that the program takes into account this aspect? Are these DEGs reliable or not? If not, how to filter the edgeR output for being more reliable? Is there a specific code in edgeR to cope with this issue?

Thanks for you help!

Best Marianna

RNA-Seq gene • 1.4k views
ADD COMMENT
1
Entering edit mode

Do you think that the program takes into account this aspect?

Yes, this is a key feature of (basically any) statistical approach.

Are these DEGs reliable or not?

Can you give a concrete example including the expression values and p-values of such a gene? Even if dispersion is higher for certain genes but relative expression is decent and fold changes are large it still can be statistically significant.

If not, how to filter the edgeR output for being more reliable?

You can always decrease FDR cutoff to be more conservative, but at the cost of false-negatives.

Please also share the code you used. Did you use filterByExpr?

ADD REPLY
0
Entering edit mode

sorry I dd a mess with the comments, see the reply below

ADD REPLY
0
Entering edit mode

Hi ATpoint,

thank you for your quick reply.

I'm speaking about the results of an analysis conducted in the past with an edgeR version lacking the filterByExpr command. Anyway I filtered data with the approach suggested by the manual of that version (see below)

This is the code:

>keep <- rowSums(cpm(dge)>0.5) >= 3

>dge <- dge[keep, , keep.lib.sizes=FALSE]

>dge_norm <- calcNormFactors(dge)

>dge_norm$samples

>design <- model.matrix(~0+Group)

>dge2 <- estimateDisp(dge_norm, design)

>fit <- glmFit(dge2,design)

>my.contrast<-makeContrasts(TreatvsControl.1h = Treated.1h-Control.1h,
TreatvsControl.6h = Treated.6h-Control.6h,
Interaction = (Treated.1h-Control.1h)-(Treated.6h-
Control.6h), levels=design)

>lrt1 <- glmLRT(fit,contrast=my.contrast[,"TreatvsControl.1h"])

This is an example of a DEG with a high sd (values are expressed in CPM) group 1: 0.256892 0.321829 0.06487 | group 2: 5.078998 0.367778 1.278966 logFC:3.28; logCPM: -0.088; LR:12.41; p-val: 0.00042; FDR: 0.075

ADD REPLY
0
Entering edit mode

Since I typically filter for FDR < 0.05 it would not significant in my eyes. The logCPM is also quite low. I would probably not trust it. If you still have the raw counts I would plug it into the current edgeR version, use filterbyexpr and also the glmQLF framework which is (from what I understand) what the developers currently recommend as the default approach. If you google glmLRT vs glmQLF you will find some posts at Bioconductor where they explain why they think it is superior oin most cases.

ADD REPLY
0
Entering edit mode

Thank you for your reply.

That's clear. Actually, with recent datasets I've implemented exactly the same approach you mentioned (filterByExpr and glmQLF).

But: in your opinion, looking at the results obtained with the previous version and the code I used (including filtering by low expression CPM< 0.5 --> about 10-15 reads), is there any reason to conclude that significant genes (FC >1.5, FDR < 10%) are not reliable?

Yes I get the point of the FDR cutoff, but this only increases the probability of having false positives and this does not question the reliability of a single DEG. Isn't it?

Thank you very much for this fruitful discussion!

Best

Marianna

ADD REPLY
0
Entering edit mode

mariannapauletto : Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

ADD REPLY
0
Entering edit mode

Thank you,

sorry for that

ADD REPLY

Login before adding your answer.

Traffic: 1874 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6