Hi,
Recently we sequenced a bunch of RNA Seq libraries. I wanted to compare the gene expression levels between all of them. So I used the standard annotation gtf
file with the tophat
aligned bam
file with cuffdiff
. I extract the FPKM
matrix using cummeRbund
. What has me confused is depending on number of bam
files under consideration the reported FPKM
values are different. The differences are not huge but they are there. Can anybody explain why this is happening and how to get around this issue?
I think i was not clear. My situation is as follows. I have 3 bam files, a.bam, b.bam, and c.bam. I have a single annotation file, annotation.gtf. If i use this annotation file with a.bam, and b.bam, I get a set of FPKM values, but if i now use a.bam, and c.bam with same annotation file, and same parameters for the cuffdiff, I get FPKM values that are different for the same genes under the condition a! That has me flummoxed!
I understand now. I think this is probably an issue caused by the sequencing bias estimation, see for example this: http://cufflinks.cbcb.umd.edu/howitworks.html#hsbi this is described for CuffLinks but I would image to apply for CuffDiff as well.