Hi. I have three samples. I used the new cufflinks pipeline to run the analysis and in the end I get the normalized fpkm values per gene using cuffnorm. However, I would like to get an average value for each gene as the three samples are from the same strain. Is it fine if I just average across the 3? or is there any way within cufflinks to get the average fpkm value?
according to the fpkm values that I get for each of the 3 samples, I have very similar values for the each gene across the 3 samples.
I was thinking either average the fpkm values of the 3 samples for each gene or perform the cuffquant based on the merged gtf file and the merged bam files (of the 3 samples), so I get a single fpkm value per gene. Which of the two do you think it is better?
The goal is to compare the genes across different strains, so I would like to get an average value for each gene and each strain.
If you want to compare strains, then averaging is not what you want to do. You'll want to keep your replicates and run cuffdiff on the replicates.
cuffdiff gives pairwise comparison among the samples and I want to compare 4 different strains at the same time (not 2 at a time).
what if I merge the samples of each strain, run cuffquant with the merged.gtf and once I get the abundances.cxb for all the different strains, I run cuffnorm? in that case I could merge the transcripts.gft files of the different strains. Does that sound reasonable?
Or I could average the fpkm values (from cuffnorm) for the 3 replicates in the same strain. I think it should be fine to average as the 3 replicates are exactly the same experiment with the same library size that was made 3 times. As well cuffnorm mentions that fpkm values are comparable between samples. what do you think?
to compare among strains, I will do an extra normalization. I want to get an approximate value per gene per strain. Which of the two ways that I mentioned fit the best? or do you have something to suggest that will allow me to compare the 4 strains all together at once?
What do you mean by "compare", statistically significant genes, or just a graphical representation via PCA or clustering?