Question

gene differential expression array analysis: weird pvalue distribution

0

Entering edit mode

10.3 years ago

Iryna Nikolayeva ▴ 30

Dear all,

I have stumbled into a weird phenomenon while analysing a small gene expression dataset from an Affymetrix array: the differential expression p-values are "impoverished" in significant p-values; it isn't a uniform distribution with a peak at low p-values, but a uniform distribution with a drop in low p-values.

My dataset is very small: I'm basically comparing 5 samples in one group versus 6 samples in the other group. I have read that small sample can generate false positives. But it's the opposite.

The dataset consists of 4 groups of measures: measures of patients a, m_a, at t0 (m_a_t0) and t1 (m_a_t1) and measures of patients b at t0 (m_b_t0) and t1 (m_b_t1). I would like to compare expressions (m_a_t1 - m_a_t0) and (m_b_t1 - m_b_t0). I have normalised them thus all together using a standard method, rma. May this be the cause? how to proceed otherwise?

Thanks for your help!

pvalue-distribution gene-expression-array • 2.8k views

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by Iryna Nikolayeva ▴ 30

4

Entering edit mode

In my experience, at least, this tends to happen when there's some sort of uncontrolled confounder. Have a look at a PCA plot and see if anything seems amiss. Perhaps you have some sort of gender interaction that needs to be offset, that's not uncommon.

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by Devon Ryan 105k

1

Entering edit mode

Thanks a lot for your response! We had indeed the same suspicion. I had made PCA plots. For the entire group of patients, first 2 components separate indeed 2 subgroups that aren't linked to the biological differences we are interested in, nor any confounding factor we may suspect (not gender, and not the day of the expression measurement). I will try to correct for the first 2 PCs.

ADD REPLY • link 10.3 years ago by Iryna Nikolayeva ▴ 30

1

Entering edit mode

You may find the SVA package in R useful in cases like this.

ADD REPLY • link 10.3 years ago by Devon Ryan 105k

score 2 · Answer 1 · 2015-05-08

2

Entering edit mode

10.3 years ago

mark.ziemann ★ 2.0k

In addition to Devon's PCA suggestion. I might add that there's no guarantee that you will get any significant genes if the variability in each sample group is too high. With pathological contexts (humans with disease) a high degree of variability is to be expected and if that variability is too great, you won't find any DGE. I would suggest to give Limma a try using a linear model, correcting for important factors gleaned from PCA. If that doesn't get you any DGE, then you can still try GSEA and identify trends at the "pathway" level.