I have calculated the log fold change values for RNA-Seq Data and would like to estimate the significance of the results. I know DESeq does it already, but I want to do it manually after having normalised the counts with RPKM.
Assuming you've truly done all of the required normalization, then you could just use a T-test or ANOVA (or other applicable linear model). Remember that you'll have lower power than a method like DESeq2 or edgeR since you'll not be using information sharing, but that's the simple manual route.
BTW, why do you want to do this? The various count-based packages are pretty nice and it's usually not a good idea to reinvent the wheel unless you have a good reason.
I just want to compare different methods for my data, because the log fold change expression distribution is shifted in the case of RPKM, but in my case it has sense (it looks a bit strange that all log fold change values are centered around 0, when there is a gene in my case that turn off all expression in the cell).
The output I want to get is the p-values for every gene after the log fold change, just like with DESeq.
If you have a log2(F.change) for each gene, T-test or anova gives the overall p-value of the library (population). So, what will you suggest if you want to assign p-value for each gene pair on wt/ko, which could tell us if the F.change is significant or not.
The T-test or ANOVA will give the per-gene p-values, since you're testing by gene (not directly comparing columns of genes from two samples against each other). In cases with no replicates, there are no really meaningful p-values possible (the best you can do is use something like GFold).
Even if there are replicates, t-test is not applicable unless you have many replicates (~10X2).
In most cases people do up to 3 replicates, let's assume that the increase or decrease of gene X is random, if in all 3 cases the gene expression was increased it's like getting 3 heads in a row, 1/8.
I think that most of the power of DESeq or cuffcompare (and my understanding of these tools is poor) is determining if the expression was increased or decreased in an experiment, i.e. if the number of mRNA molecules of gene X were different in the two conditions, this doesn't mean that the next time you'll run the experiment it will (most probably) happen again.
ADD REPLY
• link
updated 2.9 years ago by
Ram
44k
•
written 10.3 years ago by
Asaf
10k
0
Entering edit mode
You don't need ~10 samples per compared group to use a T-test, that's simply non-sense as a general statement. In the special case of gene expression data that's certainly true and of course even then your power is going to be terrible compared to DESeq/edgeR/etc., but that wasn't the question posed (and I made reference to the power issue anyway).
If you only have the fold-change values you most definitely need more than 10 replicates for the suggested t-test to be applicable if you test each gene independently. I know that people do t-tests of triplicates but that's just non-sense.
ADD REPLY
• link
updated 2.9 years ago by
Ram
44k
•
written 10.3 years ago by
Asaf
10k
0
Entering edit mode
Agreed. Note that I was replying to needing ~10 samples per group as a general requirement, not one specific to gene-expression.
A log2(foldchange) in an RPKM doesn't make any sense (that's like saying you percentage changes stored in apples). I assume you have RPKMs for two groups and want to compare them. You can use a T-test, but as mentioned above the results won't be worth much. You're better off either not using RPKMs or using something like cuffdiff, which has somewhat different requirements.
I just want to compare different methods for my data, because the log fold change expression distribution is shifted in the case of RPKM, but in my case it has sense (it looks a bit strange that all log fold change values are centered around 0, when there is a gene in my case that turn off all expression in the cell).
The output I want to get is the p-values for every gene after the log fold change, just like with DESeq.
Thanks
Hey Devon,
If you have a log2(F.change) for each gene, T-test or anova gives the overall p-value of the library (population). So, what will you suggest if you want to assign p-value for each gene pair on wt/ko, which could tell us if the F.change is significant or not.
Thanks !
The T-test or ANOVA will give the per-gene p-values, since you're testing by gene (not directly comparing columns of genes from two samples against each other). In cases with no replicates, there are no really meaningful p-values possible (the best you can do is use something like GFold).
Even if there are replicates, t-test is not applicable unless you have many replicates (~10X2).
In most cases people do up to 3 replicates, let's assume that the increase or decrease of gene X is random, if in all 3 cases the gene expression was increased it's like getting 3 heads in a row, 1/8.
I think that most of the power of DESeq or cuffcompare (and my understanding of these tools is poor) is determining if the expression was increased or decreased in an experiment, i.e. if the number of mRNA molecules of gene X were different in the two conditions, this doesn't mean that the next time you'll run the experiment it will (most probably) happen again.
You don't need ~10 samples per compared group to use a T-test, that's simply non-sense as a general statement. In the special case of gene expression data that's certainly true and of course even then your power is going to be terrible compared to DESeq/edgeR/etc., but that wasn't the question posed (and I made reference to the power issue anyway).
If you only have the fold-change values you most definitely need more than 10 replicates for the suggested t-test to be applicable if you test each gene independently. I know that people do t-tests of triplicates but that's just non-sense.
Agreed. Note that I was replying to needing ~10 samples per group as a general requirement, not one specific to gene-expression.
In this tutorial (http://cgrlucb.wikispaces.com/Spring+2012+DESeq+Tutorial) they applied deseq with 2-3 replicates for 2 conditions. My question is: how could I do the same but with the log2foldchange values in RPKM?
A log2(foldchange) in an RPKM doesn't make any sense (that's like saying you percentage changes stored in apples). I assume you have RPKMs for two groups and want to compare them. You can use a T-test, but as mentioned above the results won't be worth much. You're better off either not using RPKMs or using something like cuffdiff, which has somewhat different requirements.