Question

DE analysis with two samples in multiple samples

0

Entering edit mode

7.3 years ago

Satyajeet Khare ★ 1.6k

I use EdgeR to perform DE analysis using the standard protocol. The steps are as follows.

Alignment using HiSAT2,
Count matrix generation using PrepDE.py,
DE analysis using EdgeR using LRT.

When I perform DE analysis with count matrix for only two ~~samples~~ groups which I need to compare, I get larger number of deferentially expressed genes, as compared to, when I perform DE analysis with count matrix for large number of samples, and compare the same two ~~samples~~ groups using contrast parameter.

I am assuming that presence of counts from other group samples affects normalisation and dispersion of counts of samples from these two groups which are of my interest.

My question is, which DE genes should I trust? The ones I get when I use only two-~~sample-~~group-count-matrix or the one I get when I use an all-~~sample~~-group-count-matrix?

EdgeR RNA-Seq • 1.7k views

ADD COMMENT • link 7.3 years ago by Satyajeet Khare ★ 1.6k

score 0 · Answer 1 · 2017-08-14

0

Entering edit mode

7.3 years ago

Devon Ryan 104k

You should get essentially no DE genes with just two samples, that you're not is a problem with edgeR (or your usage of it). Never trust unreplicated experiments. You should only follow up on the results with multiple samples per group.

Even contrasts where the two "groups" being compared are comprised of single samples are largely meaningless.

ADD COMMENT • link 7.3 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Devon,

Please let me correct myself. By two samples I meant two groups.

In other words, if I perform DE analysis between C1 and T1 groups using C1 and T1 count matrix, I get more number of DE genes. If I perform DE analysis between C1 and T1 groups using C1, T1, C2, T2, C3, T3 count matrix, I get less number of DE genes.

ADD REPLY • link 7.3 years ago by Satyajeet Khare ★ 1.6k

1

Entering edit mode

That's better then. Your ability to properly assess variance increases with sample number, so in general the design with more groups will be more reliable. My presumption is that the two group case isn't having extreme variance cases penalized as much.

ADD REPLY • link 7.3 years ago by Devon Ryan 104k

0

Entering edit mode

What if the library prep method for C1, T1, C2, T2 is different from C3, T3, C4, T4? Both are PolyA based kits but from different manufacturers.

Would that lead to variability due to technical reasons? In that case should I make separate count matrix for first four samples and separate count matrix for last four samples?