Dear all,
I have stumbled into a weird phenomenon while analysing a small gene expression dataset from an Affymetrix array: the differential expression p-values are "impoverished" in significant p-values; it isn't a uniform distribution with a peak at low p-values, but a uniform distribution with a drop in low p-values.
My dataset is very small: I'm basically comparing 5 samples in one group versus 6 samples in the other group. I have read that small sample can generate false positives. But it's the opposite.
The dataset consists of 4 groups of measures: measures of patients a, m_a
, at t0
(m_a_t0
) and t1
(m_a_t1
) and measures of patients b at t0
(m_b_t0
) and t1
(m_b_t1
). I would like to compare expressions (m_a_t1 - m_a_t0
) and (m_b_t1 - m_b_t0
). I have normalised them thus all together using a standard method, rma. May this be the cause? how to proceed otherwise?
Thanks for your help!
In my experience, at least, this tends to happen when there's some sort of uncontrolled confounder. Have a look at a PCA plot and see if anything seems amiss. Perhaps you have some sort of gender interaction that needs to be offset, that's not uncommon.
Thanks a lot for your response! We had indeed the same suspicion. I had made PCA plots. For the entire group of patients, first 2 components separate indeed 2 subgroups that aren't linked to the biological differences we are interested in, nor any confounding factor we may suspect (not gender, and not the day of the expression measurement). I will try to correct for the first 2 PCs.
You may find the SVA package in R useful in cases like this.