Question

p-values for log fold change in RPKM

2

Entering edit mode

10.3 years ago

Juan Cordero ▴ 140

Hi!

I have calculated the log fold change values for RNA-Seq Data and would like to estimate the significance of the results. I know DESeq does it already, but I want to do it manually after having normalised the counts with RPKM.

Some ideas?

Thanks in advance

RPKM p-value RNA-Seq Normalization • 11k views

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.3 years ago by Juan Cordero ▴ 140

Ram · Accepted Answer · 2014-08-11

4

Entering edit mode

10.3 years ago

Devon Ryan 104k

Assuming you've truly done all of the required normalization, then you could just use a T-test or ANOVA (or other applicable linear model). Remember that you'll have lower power than a method like DESeq2 or edgeR since you'll not be using information sharing, but that's the simple manual route.

BTW, why do you want to do this? The various count-based packages are pretty nice and it's usually not a good idea to reinvent the wheel unless you have a good reason.

ADD COMMENT • link 10.3 years ago by Devon Ryan 104k

0

Entering edit mode

I just want to compare different methods for my data, because the log fold change expression distribution is shifted in the case of RPKM, but in my case it has sense (it looks a bit strange that all log fold change values are centered around 0, when there is a gene in my case that turn off all expression in the cell).

The output I want to get is the p-values for every gene after the log fold change, just like with DESeq.

Thanks

ADD REPLY • link 10.3 years ago by Juan Cordero ▴ 140

0

Entering edit mode

Hey Devon,

If you have a log2(F.change) for each gene, T-test or anova gives the overall p-value of the library (population). So, what will you suggest if you want to assign p-value for each gene pair on wt/ko, which could tell us if the F.change is significant or not.

Thanks !

ADD REPLY • link 10.3 years ago by Chirag Nepal ★ 2.4k

1

Entering edit mode

The T-test or ANOVA will give the per-gene p-values, since you're testing by gene (not directly comparing columns of genes from two samples against each other). In cases with no replicates, there are no really meaningful p-values possible (the best you can do is use something like GFold).

ADD REPLY • link 10.3 years ago by Devon Ryan 104k

0

Entering edit mode

Even if there are replicates, t-test is not applicable unless you have many replicates (~10X2).

In most cases people do up to 3 replicates, let's assume that the increase or decrease of gene X is random, if in all 3 cases the gene expression was increased it's like getting 3 heads in a row, 1/8.

I think that most of the power of DESeq or cuffcompare (and my understanding of these tools is poor) is determining if the expression was increased or decreased in an experiment, i.e. if the number of mRNA molecules of gene X were different in the two conditions, this doesn't mean that the next time you'll run the experiment it will (most probably) happen again.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.3 years ago by Asaf 10k

0

Entering edit mode

You don't need ~10 samples per compared group to use a T-test, that's simply non-sense as a general statement. In the special case of gene expression data that's certainly true and of course even then your power is going to be terrible compared to DESeq/edgeR/etc., but that wasn't the question posed (and I made reference to the power issue anyway).

ADD REPLY • link 10.3 years ago by Devon Ryan 104k

0

Entering edit mode

If you only have the fold-change values you most definitely need more than 10 replicates for the suggested t-test to be applicable if you test each gene independently. I know that people do t-tests of triplicates but that's just non-sense.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.3 years ago by Asaf 10k

0

Entering edit mode

Agreed. Note that I was replying to needing ~10 samples per group as a general requirement, not one specific to gene-expression.

ADD REPLY • link 10.3 years ago by Devon Ryan 104k

0

Entering edit mode

In this tutorial (http://cgrlucb.wikispaces.com/Spring+2012+DESeq+Tutorial) they applied deseq with 2-3 replicates for 2 conditions. My question is: how could I do the same but with the log2foldchange values in RPKM?

ADD REPLY • link 10.3 years ago by Juan Cordero ▴ 140

0

Entering edit mode

A log2(foldchange) in an RPKM doesn't make any sense (that's like saying you percentage changes stored in apples). I assume you have RPKMs for two groups and want to compare them. You can use a T-test, but as mentioned above the results won't be worth much. You're better off either not using RPKMs or using something like cuffdiff, which has somewhat different requirements.

ADD REPLY • link 10.3 years ago by Devon Ryan 104k