Question

How to compare two groups of 3 samples

0

Entering edit mode

8.2 years ago

moxu ▴ 510

I have 2 experimental conditions, each generate a series of values for some genes, like the following:

   s1  s2  s3
g1 n11 n12 n13
g2 n21 n22 n23 . . .

s1, s2, s3 are samples; g1, g2, ... are gene names; n11, n12, ... are the corresponding gene expression levels, nij being the expression of gene i in sample j. s1 & s2 belong to one group (treatment), and s3 is the other group (control) by itself.

My biological question: how to find out whether a gene is statistically differentially expressed? Or statistically, what test should I use to find whether ni3 is sinificantly different from ni1 & ni2?

Thank you!

R gene • 2.7k views

ADD COMMENT • link updated 8.2 years ago by shunyip ▴ 250 • written 8.2 years ago by moxu ▴ 510

0

Entering edit mode

A quick heat plot should give you a idea of how the expression looks. Please ensure they are normalized.

ADD REPLY • link 8.2 years ago by sridhar56 ▴ 110

0

Entering edit mode

assuming normally distributed.

ADD REPLY • link 8.2 years ago by moxu ▴ 510

score 0 · Answer 1 · 2017-02-22

0

Entering edit mode

8.2 years ago

shunyip ▴ 250

You can look at the manuals of several Bioconductor tools, such as limma, edgeR and DESeq2.

ADD COMMENT • link 8.2 years ago by shunyip ▴ 250

0

Entering edit mode

I use edgeR, not sure if edgeR is suited for comparing a one-sample group with another group. Just statistically, what would be the way to go? Pooled t-test requires variance of each group, but a group with one sample does not have a variance. One way I can think of is to compute mean i = mean(ni1, ni2, ni3), var i = var(ni1, ni2, ni3), and do a t-test using ti = (ni3 - mean i) / sqrt(var i / 3). Another similar test could be ti = (ni3 - (ni1 + ni2)/2) / sqrt(var(ni1, ni2) / 2). Not sure if any of these two methods is appropriate.

ADD REPLY • link 8.2 years ago by moxu ▴ 510

0

Entering edit mode

If you do not have replicates, you will have to assume that the sample's expression values are all accurate.

Instead of performing a t test gene by gene, I would suggest calculating the fold change of all genes. Then, identify genes with significantly high log2 fold changes as DEG. This way, you can "borrow" information across genes to compensate for things like batch effects. I believe it should be safe to assume that the log2 fold changes are normally distributed, but you might need to make sure.

ADD REPLY • link 8.2 years ago by shunyip ▴ 250

0

Entering edit mode

log2FC looks bimodal, split around 0.

ADD REPLY • link 8.2 years ago by moxu ▴ 510

0

Entering edit mode

You might need to normalize your expression data then. Are you using CPM or TPM?

ADD REPLY • link 8.2 years ago by shunyip ▴ 250

0

Entering edit mode

The library sizes are almost identical -- the ratio is like 1.006xx.

Since you asked about CPM or TPM, I have to admit that I lied -- it's not gene expression data but ChIP-seq signal. But I don't think it matters, right?

ADD REPLY • link 8.2 years ago by moxu ▴ 510

0

Entering edit mode

It shouldn't matter.

Um.. did you filter all signals where one of the samples is zero or has very low read count?

From personal experience, when I see bimodal in this situation, one of the peaks could be caused by low count genes. Usually, after I filter it, it will become normal.

ADD REPLY • link 8.2 years ago by shunyip ▴ 250

0

Entering edit mode

I use edgeR, not sure if edgeR is suited for comparing a one-sample group with another group. Just statistically, what would be the way to go?

Statistically, the best way to go would be not having designs with only one replicate. There will be software that can calculate differential expression for this very limited set of samples, but the results will be very unreliable and would need replication in a bigger independent cohort to ensure generalization of the results is possible.

ADD REPLY • link 8.2 years ago by WouterDeCoster 48k