Entering edit mode
9.4 years ago
rodrigo.nicodemos82
▴
10
Hello,
I am performing RNAseq analysis using TopHat and Cufflinks programs. I have a question for why a gene I am interested is being called Not Significant by Cuffdiff.
Here the output for my gene of interest Rh4:
gene sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value signific
Rh4 WT MUT OK 1158.87 0.856892 -10.4013 -4.12026 0.1178 0.404333 no
I am not familiar with the statistics Cuffdiff uses to call my gene of interest Not Significant. Can someone explain it to me?
Thank you very much.
Rodrigo
what's the variance, do you have any replicates?
As Michael says, Variance or replicates will most likely be your issue. Have you thought about plotting the FPKM values of each replicate to see how variable it is? - You can use cummeRbund to read in a cuffdiff output and extract the FPKM values.
The FPKM values for the replicates were:
WT_1
: 1245.24,WT_2
: 874.11 andMUT_1
: 0.897,MUT_2
: 0.764,MUT_3
: 0.823. The third replicate for the WT condition gave very few reads, which is why I didn't include it. I only used two replicates from the MUT condition to compare to the two replicates from the WT condition. I am not convinced it is a replicate issue or a variance issue, since the trend it pretty obvious to me, but Cuffdiff it not picking it up statistically.Have you ran your results through DESeq2? - DESeq2 has a function (I think it's called a Cook's distance), to figure out if samples have too high a variance for a given condition, thus it excludes them. The other thing is to pull out a few of the "Top Hits" from your analysis and see what their FPKM's are like, see if the variance of your wild type fits with the difference your seeing in the case above.
This is actually an interesting situation - I agree with the OP that the gene should come up as differentially expressed - but the way this and probably all other tools work is a bit mysterious and frankly occasionally plain wrong.
Even before multiple comparison testing the p value is already huge 0.1
I have a hard time believing that getting a 1245 and 874 versus 0.9 and 0.7 could possibly be that likely to obtain.
Caveat emptor before buying into any tool.
That seems pretty wacky, but we're only seeing 1 gene. In general the p-values you see will depend also on the rest of the population (though that gene *is* very extreme). The numbers above suggest that you have highly scattered data - that your wt and mutants are *very* different from each other overall (is this true?). The second thing is that Cuffdiff works by summing the values of all transcripts from the locus (which makes for odd numbers sometime). Is this a complicated locus? Do you have the individual transcript values? Have you looked at the read mappings in a genome browser? Any artifacts going on?