Doubts Regarding Calculation Of Fpkm Values By Cuffdiff?
3
0
Entering edit mode
11.0 years ago
Sameet ▴ 300

Hi,

Recently we sequenced a bunch of RNA Seq libraries. I wanted to compare the gene expression levels between all of them. So I used the standard annotation gtf file with the tophat aligned bam file with cuffdiff. I extract the FPKM matrix using cummeRbund. What has me confused is depending on number of bam files under consideration the reported FPKM values are different. The differences are not huge but they are there. Can anybody explain why this is happening and how to get around this issue?

cuffdiff rnaseq analysis • 3.2k views
ADD COMMENT
0
Entering edit mode
11.0 years ago

not sure what you mean by "depending on the number of bam files".

If you change the data the FPKM will change - it can't really be the same, after all there is no rule that says that a gene has to express at the exact same level in time or across tissues, conditions etc. not to mention the inherent variability of the measurement process.

Of course the question is how much does it change? Is the change large enough to invalidate the claim that you would be making? If so the problem is with the claim or the new data is not compatible with the hypothesis that you were after.

ADD COMMENT
0
Entering edit mode

I think i was not clear. My situation is as follows. I have 3 bam files, a.bam, b.bam, and c.bam. I have a single annotation file, annotation.gtf. If i use this annotation file with a.bam, and b.bam, I get a set of FPKM values, but if i now use a.bam, and c.bam with same annotation file, and same parameters for the cuffdiff, I get FPKM values that are different for the same genes under the condition a! That has me flummoxed!

ADD REPLY
0
Entering edit mode

I understand now. I think this is probably an issue caused by the sequencing bias estimation, see for example this: http://cufflinks.cbcb.umd.edu/howitworks.html#hsbi this is described for CuffLinks but I would image to apply for CuffDiff as well.

ADD REPLY
0
Entering edit mode
11.0 years ago
Nick ▴ 290

Istvan is right - sequencing bias is one possible reason. The other (for me more likely) is the normalisation. The FPKM that cuffdiff produces are normalised but the normalisation factors depend on all samples that have been included in the study. In your case the set of samples are not identical which may lead to different normalisation factors and than different FPKM.

ADD COMMENT
0
Entering edit mode
11.0 years ago
Nick ▴ 290

By the way, I stopped using cuffdiff a few months ago because the FDR values produced by the most recent version did not seem at all convincing. I would recommend using edgeR or DESeq.

ADD COMMENT

Login before adding your answer.

Traffic: 2121 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6