I am comparing two samples (control and treated) paired end RNA Seq data on galaxy. I am getting different FPKM values on my cufflinks output of 2 samples when compared to the 2 values that cuffdiff..
Below is the snap shot of a gene with different FPKM values. (I am not displaying the gene name).
Cufflinks output:
Gene length coverage FPKM FPKM_conf_lo FPKM_conf_hi FPKM_status Sample
xyz - - 39.9094 38.9553 40.8635 OK Control
xyz - - 19.4664 18.7786 20.1542 OK Treated
Hi,
as you can read il the online documentation at http://cufflinks.cbcb.umd.edu/howitworks.html#reps
the cuffdiff tool compute the FPKM in a slightly differnt way with respect to cufflink, it use a dispersion model deriver by all the samples you are analyzing. This sentence may help:
"Cuffdiff takes an approach to differential expression analysis that is radically different from most other RNA-seq analysis packages. Because Cufflinks calculates individual transcript abundances, it is very sensitive when looking for differentially expressed genes, especially when those genes are alternatively spliced. However, in order to deal with the overdispersion that is known to exist among biological replicates, Cuffdiff fits a model for fragment count variances in each condition prior to doing any testing. Cuffdiff uses the LOCFIT regression package, written by Catherine Loader and Jiayang Sun, for this purpose. Cuffdiff models fragment count overdispersion the same way Anders and Huber do in their DEseq package to derive a count dispersion model for each experimental condition. If only one replicate is available in each condtion, Cuffdiff pools the conditions together to derive a dispersion model. The dispersion model, which describes variances of fragment counts across replicates, is then used to calculate the variances on a gene's relative expression level across replicates. It is these expression level variances that are used during testing for differences at the gene and transcript level."
Hi!
I am currently having a similar problem to yours. When I run the cuffdiff package (with reference) I get FPKM values that differ greatly from the FPKM values per replicate (n=3) obtained when doing the same with cufflinks. Let me show you an example:
This would be the FPKM values obtained from the CuffDiff output:
A B C D E F
gene 107.894 62.4416 16.6914 2.18289 0.196219 0.977153
gene.1 59.4121 34.5872 8.18243 2.01778 0.210608 1.06329
gene.2 41.2712 22.9315 7.98328 0.256128 3.48E-06 0
And these are the FPKM values from the cufflinks' output:
As you can see, it seems unlikely that the FPKM values from the gene expression data (gene) from cufflinks (A1, A2 and A3) can turn into the cuffdiff FPKM value (A). The same goes for each transcript isoform (gene.1 and gene.2).
Can anyone point me in the right direction to go from the cufflinks FPKMs to the cuffdiff FPKMs?
I'm also having issue with results from cuffdiff in galaxy, in which I'm getting a lot of FPKM values of 0 (zero). For all significant genes at p<0.001 and two conditions, one of the conditions has a value of zero in every case.
What is most troubling, however, is that running the exact same .bam and gtf files with the exact same options on the gene pattern server does not produce such a result. That is, genes with highest significance (lowest p) do not have fpkm values of zero in either sample 1 or sample 2, as galaxy reports.