Question

Doubts Regarding Calculation Of Fpkm Values By Cuffdiff?

0

Entering edit mode

11.4 years ago

Sameet ▴ 300

Hi,

Recently we sequenced a bunch of RNA Seq libraries. I wanted to compare the gene expression levels between all of them. So I used the standard annotation gtf file with the tophat aligned bam file with cuffdiff. I extract the FPKM matrix using cummeRbund. What has me confused is depending on number of bam files under consideration the reported FPKM values are different. The differences are not huge but they are there. Can anybody explain why this is happening and how to get around this issue?

cuffdiff rnaseq analysis • 3.4k views

ADD COMMENT • link updated 11.4 years ago by Nick ▴ 290 • written 11.4 years ago by Sameet ▴ 300

score 0 · Answer 1 · 2013-11-15

0

Entering edit mode

11.4 years ago

Istvan Albert 102k

not sure what you mean by "depending on the number of bam files".

If you change the data the FPKM will change - it can't really be the same, after all there is no rule that says that a gene has to express at the exact same level in time or across tissues, conditions etc. not to mention the inherent variability of the measurement process.

Of course the question is how much does it change? Is the change large enough to invalidate the claim that you would be making? If so the problem is with the claim or the new data is not compatible with the hypothesis that you were after.

ADD COMMENT • link 11.4 years ago by Istvan Albert 102k

0

Entering edit mode

I think i was not clear. My situation is as follows. I have 3 bam files, a.bam, b.bam, and c.bam. I have a single annotation file, annotation.gtf. If i use this annotation file with a.bam, and b.bam, I get a set of FPKM values, but if i now use a.bam, and c.bam with same annotation file, and same parameters for the cuffdiff, I get FPKM values that are different for the same genes under the condition a! That has me flummoxed!

ADD REPLY • link 11.4 years ago by Sameet ▴ 300

0

Entering edit mode

I understand now. I think this is probably an issue caused by the sequencing bias estimation, see for example this: http://cufflinks.cbcb.umd.edu/howitworks.html#hsbi this is described for CuffLinks but I would image to apply for CuffDiff as well.

ADD REPLY • link 11.4 years ago by Istvan Albert 102k

score 0 · Answer 2 · 2013-11-15

Istvan is right - sequencing bias is one possible reason. The other (for me more likely) is the normalisation. The FPKM that cuffdiff produces are normalised but the normalisation factors depend on all samples that have been included in the study. In your case the set of samples are not identical which may lead to different normalisation factors and than different FPKM.

score 0 · Answer 3 · 2013-11-15

0

Entering edit mode

11.4 years ago

Nick ▴ 290

By the way, I stopped using cuffdiff a few months ago because the FDR values produced by the most recent version did not seem at all convincing. I would recommend using edgeR or DESeq.

ADD COMMENT • link 11.4 years ago by Nick ▴ 290