Question

Probability Of Expression Changes 2-, 5-, ...100-Fold

3

Entering edit mode

13.5 years ago

Israel Barrantes ▴ 790

In RNA-seq and other gene expression approaches, usually you calculate the probability of obtaining a Y value (measured in sample B) from X (sample A), such in the case discussed by Audic and Claverie (Genome Res. 1997 Oct;7:986).

Now the case is the following: Having the read counts of two samples (X and Y, for each different transcript available), we would like to obtain a list of all transcript IDs, the true expression level of which is, with 95 % confidence, at least 5-fold different in the two samples. Which statistical test could help in this case?

Certainly, we would like to choose between 95% and 99% confidence intervals and betwees arbitrary cut-offs of x-fold expression, receiving e.g. a list of all transcripts that are 2-fold or 20-fold or 100-fold overexpressed at the choosen error probability p < given value

rna statistics gene • 4.3k views

ADD COMMENT • link updated 13.5 years ago by Marcin Cieslik ▴ 520 • written 13.5 years ago by Israel Barrantes ▴ 790

0

Entering edit mode

In common with the 2 answers so far, I don't understand the question. Could you add some additional information or consider re-wording it, because I'm not sure it's answerable in its current form.

ADD REPLY • link 13.5 years ago by User 59 13k

0

Entering edit mode

Here it's the question, posed in a different way:

We have the read counts of two samples, X and Y, for each different transcript available.

The question is now the following: Give me a list of all transcript names, the true expression level of which is, with 95 % confidence, at least 5-fold different in the two samples.

Certainly, we would like to choose between 95% and 99% confidence intervals and betwees arbitrary cut-offs of x-fold expression, receiving e.g. a list of all transcripts that are 2-fold or 20-fold or 100-fold overexpressed at the choosen error probability p < given value

ADD REPLY • link 13.5 years ago by Israel Barrantes ▴ 790

0

Entering edit mode

Ah, I see. This is something i have been asking myself for a long time, but I don't have a solution. I will keep watching this thread...

ADD REPLY • link 13.5 years ago by Lyco ★ 2.3k

0

Entering edit mode

I edited the question accordingly.

ADD REPLY • link 13.5 years ago by Israel Barrantes ▴ 790

Ram · Answer 1 · 2011-06-07

I am not entirely sure what your question is. You can calculate the probabiliby of finding 2x or 5x enrichment with the Aucid & Claverie statistics, but of course the probability depends on the actual count number, not only on the factor. There is an online server for performing the calculation and, according to their webpage, a 'unix version' of the program can be downloaded from http://www.igs.cnrs-mrs.fr/SpipInternet/spip.php?article168

score 1 · Answer 2 · 2011-06-07

I am not sure I completely understand the question. But the fold changes you find will really depend on what your samples are. If you for instance compare a knockout strain with a native strain fold changes will be very high (or infinity if you assume the knockout really gave expression zero). Same for null alleles. A hundred fold fold change would almost certainly be something like that. Copy number variations also tend to give high fold changes in expression.

On the other hand we often times search for effects of treatment in two samples that are otherwise as comparable as can be. E.g. the same individual before and after treatment. In nutritional interventions for instance we hardly ever find high fold changes. Two fold would already be very high. But what do you expect? you normally don't get blond hair all of a sudden from eating candies (although some of these might give you blue hair).

Ram · Answer 3 · 2011-06-23

(I write from memory as I do not have access to the paper, so this might not be accurate)

Having two read counts X and Y for a transcript and the total number of sequenced reads (A and B) the the poisson margin test (introduced here http://www.ncbi.nlm.nih.gov/pubmed/21385042) gives the probability of observing a count difference at least as high as D = Y - X, purely by chance with the rate of the poisson processes that generated X and Y the same (but unknown). In other words a low probability allows one to reject the hypothesis that there is no fold-change.

A different approach is to (somehow) estimate the rates of the generating processes and to calculate the p-value exactly (using the negative binomial: http://precedings.nature.com/documents/4282/version/1/files/npre20104282-1.pdf or negative binomial differential: http://smithlab.cmb.usc.edu/histone/rseg/rseg-supp.pdf)