I have sequenced RNA from 12 human samples (6 tumor (of which 3 tumor group A and 3 tumor group B), 6 matched non-tumor) samples. Using tophat, I have aligned the reads to hg19 and with cufflinks I have made transcript models for each sample. I would like to extract the FPKM values for each sample in matrix format so that I can do hierarchical clustering and principal compononent analysis on all 12 samples.
The problem is that for each sample, different transcripts are assembled by cufflinks so I cannot just paste the cufflinks files together to get the matrix. Something that came into my mind to do this was using a reference transcript file and use bedtools/bedops to look for intersecting transcripts in all 12 samples. However, I hope I am overlooking some functionality in cufflinks/cuffcompare/cuffdiff to get this done more easily
Thanks, I will try STAR + HTSEQ + EdgeR next to tophat + cufflinks + cuffdiff