Question

EdgeR (TMM): Samples with outlier but still show extremely low p-value and FDR

0

Entering edit mode

6.9 years ago

Joe ▴ 30

please see my data here:

https://user-images.githubusercontent.com/20710640/34529987-ba9f1414-f07b-11e7-913a-3ea787771a6e.JPG

https://github.com/Jinggg2016/NGS/issues/4

These are not raw data but normalized after edgeR. I list the first few genes with highest fold change, and found one sample is definitely a outlier (highlight in yellow), which cause high fold change (If I remove this outlier, the fold change is only 2 fold-ish.) I am so surprised that the p value and FDR are both extremely small even with an outlier.

Is it common issue when use edgeR for differential expression?

If it is a real issue, how could I find out outlier if I have a large set samples (eg, >100 samples) for data analysis?

We usually use DEseq2 for DE, DEseq2 can identify outlier and report NA for p value.

Thanks,

p-value EdgeR trimmed mean of M-values • 3.5k views

ADD COMMENT • link updated 6.6 years ago by digrigor • 0 • written 6.9 years ago by Joe ▴ 30

0

Entering edit mode

Hello

Did you resolve your problem?

I have similar a behaviour with use edgeR. If i have one outlier in one of my four biological replicates the program takes it as DE gene. I don't understand why this happen, but seem to be common

am thinking to changue to deseq2

ADD REPLY • link 6.8 years ago by vm.higareda ▴ 30

0

Entering edit mode

Hi swbarnes2,

I have exactly the same issue here. Genes that have an outlier value in one of the compared conditions are considered as DE by edgeR (small P-value and large abs(logFC)) and I am trying to figure out why.

So i calculated the log2 Fold Change based on the CPM mean values of the compared conditions and I figured out that it is similar to the one calculated by edgeR.

So edgeR's LogFC is similar to log(meanCPMa/meanCPMb) with the only difference that it is adjusted so genes with low counts do not usually have big abs(logFC). Maybe it would be useful to calculate the logFC of the CPM medians which accounts for the outlier samples. However if we use that, what's the point of using edgeR at all?

It would be really helpful if you could tell us what did you do eventually. Did you find any further solution? Did you switch to DESeq2?

Thanks.

ADD REPLY • link 6.6 years ago by digrigor • 0

0

Entering edit mode

Please use the "ADD COMMENT" button to add comments.

ADD REPLY • link 6.6 years ago by Devon Ryan 104k

score 0 · Answer 1 · 2018-02-19

0

Entering edit mode

6.8 years ago

swbarnes2 14k

A tiny p-value means that the software is very sure the difference between the groups is real. It has nothing at all to do with how large the difference itself is.

ADD COMMENT • link 6.8 years ago by swbarnes2 14k

1

Entering edit mode

But even if you see one replica is an outliers as in your example?¿ Did you trust in that gene?

ADD REPLY • link 6.8 years ago by vm.higareda ▴ 30