gene differential expression array analysis: weird pvalue distribution
1
0
Entering edit mode
9.6 years ago

Dear all,

I have stumbled into a weird phenomenon while analysing a small gene expression dataset from an Affymetrix array: the differential expression p-values are "impoverished" in significant p-values; it isn't a uniform distribution with a peak at low p-values, but a uniform distribution with a drop in low p-values.

My dataset is very small: I'm basically comparing 5 samples in one group versus 6 samples in the other group. I have read that small sample can generate false positives. But it's the opposite.

The dataset consists of 4 groups of measures: measures of patients a, m_a, at t0 (m_a_t0) and t1 (m_a_t1) and measures of patients b at t0 (m_b_t0) and t1 (m_b_t1). I would like to compare expressions (m_a_t1 - m_a_t0) and (m_b_t1 - m_b_t0). I have normalised them thus all together using a standard method, rma. May this be the cause? how to proceed otherwise?

Thanks for your help!

pvalue-distribution gene-expression-array • 2.5k views
ADD COMMENT
4
Entering edit mode

In my experience, at least, this tends to happen when there's some sort of uncontrolled confounder. Have a look at a PCA plot and see if anything seems amiss. Perhaps you have some sort of gender interaction that needs to be offset, that's not uncommon.

ADD REPLY
1
Entering edit mode

Thanks a lot for your response! We had indeed the same suspicion. I had made PCA plots. For the entire group of patients, first 2 components separate indeed 2 subgroups that aren't linked to the biological differences we are interested in, nor any confounding factor we may suspect (not gender, and not the day of the expression measurement). I will try to correct for the first 2 PCs.

ADD REPLY
1
Entering edit mode

You may find the SVA package in R useful in cases like this.

ADD REPLY
2
Entering edit mode
9.6 years ago
mark.ziemann ★ 1.9k

In addition to Devon's PCA suggestion. I might add that there's no guarantee that you will get any significant genes if the variability in each sample group is too high. With pathological contexts (humans with disease) a high degree of variability is to be expected and if that variability is too great, you won't find any DGE. I would suggest to give Limma a try using a linear model, correcting for important factors gleaned from PCA. If that doesn't get you any DGE, then you can still try GSEA and identify trends at the "pathway" level.

ADD COMMENT
0
Entering edit mode

Thanks! I will try that out!

ADD REPLY

Login before adding your answer.

Traffic: 2607 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6