I have 2 conditions each having 4 replicates. I ran tophat-cuffdiff twice - once using 2 groups of 4 replicates and once excluding one of the replicates for one of the conditions (i.e. one condition had only 3 replicates, the other had the same set of 4 replicates as before).
I was somewhat surprised to find out that the FPKM values differed substantially between these two analytic runs for BOTH conditions even though the data for one of the conditions was the same in both runs.
The only explanation I can think of is that cuffdiff estimates the FPKM values by pooling the data from both conditions. Does anyone know whether this is true?
Was this based on de-novo assembly?
No. I used a genome build as a reference along with the corresponding GTF file
I agree that this sounds undesirable. What happens if you run it multiple times on the exact same data?
I think of trying it as a sort of sanity check.
I stumbled upon http://seqanswers.com/forums/showthread.php?t=4606 which seems to address a similar question. Some of the posts there seem to confirm that the FPKM values reported are not absolute but, rather, incorporate some sort of normalisation across conditions. I would still find it helpful if anyone can give a more definitive confirmation.
It is definitely like that, I don't have a reference but I have seen the same thing in my own runs.