Question

How Do You Test The Global Null Hypothesis In Expression Analysis?

2

Entering edit mode

14.3 years ago

Jeremy Leipzig 23k

What is the standard way to test whether two samples exhibit significantly different expression on a global basis?

Most rna-seq and microarray analysis packages seem to be concerned with identifying differentially expressed genes on a gene-by-gene basis, then correct for multiple testing.

Is there a single statistic, like an f-test, that is typically used for comparing all the genes of two samples at once? Or do they just aggregate the individual tests?

gene rna microarray • 5.5k views

ADD COMMENT • link updated 14.1 years ago by Michael.James.Clark ▴ 570 • written 14.3 years ago by Jeremy Leipzig 23k

0

Entering edit mode

What about a simple correlation ? Computing p-values with only two samples in your hands is quite complex. You just can NOT make a parametric test with only one observation from each population, even with 10000 genes expression values. You need multiple observations to estimate parameters (mean and variance if we consider the Gaussian case) of your population. I would personally go for a non parametric randomization test as David suggested below.

ADD REPLY • link 14.3 years ago by toni ★ 2.2k

score 2 · Answer 1 · 2011-01-30

Typically we use a clustering approach to estimate how closely related expression profiles are on a global scale. Hierarchical and k-means clustering are both commonly used. The similarity of the clusters can then be calculated.

So yes, it is indeed performing multiple independent tests and aggregating the results, but that's really the nature of the data.

Ram · Answer 2 · 2011-01-30

I don't know of an existing method for this; I suspect the answer would come from the general statistical literature. One approach that comes to mind is to measure distance between the two profiles (e.g. euclidian distance, or your preferred metric), call it D_obs, and then use permutation to put a non-parametric p-value on the distance. In each permutation, you shuffle one array and measure distance as D_perm. Then:

P = (number of times D_perm > D_obs) / N_perms.

score 0 · Answer 3 · 2011-01-30

0

Entering edit mode

14.3 years ago

Istvan Albert 102k

I would say that the null hypothesis is that the samples come from the same population - if even one gene is differentially expressed the null hypothesis is rejected with the given p-value.

You just need to come up with the definition of 'global' (how many of the total) and check that there are indeed that many differentially expressed genes. You shouldn't need to aggregate the p-values (as the methods themselves should account for the multiple tests).

ADD COMMENT • link 14.3 years ago by Istvan Albert 102k

0

Entering edit mode

i can't believe with the 10000 microarray papers out there that there isn't already a standard way of computing the probability two samples are from the same population

ADD REPLY • link 14.3 years ago by Jeremy Leipzig 23k