Get Normalized Read Count Sample Matrix Tophat/Cufflinks
2
2
Entering edit mode
11.7 years ago
Irsan ★ 7.8k

I have sequenced RNA from 12 human samples (6 tumor (of which 3 tumor group A and 3 tumor group B), 6 matched non-tumor) samples. Using tophat, I have aligned the reads to hg19 and with cufflinks I have made transcript models for each sample. I would like to extract the FPKM values for each sample in matrix format so that I can do hierarchical clustering and principal compononent analysis on all 12 samples.

The problem is that for each sample, different transcripts are assembled by cufflinks so I cannot just paste the cufflinks files together to get the matrix. Something that came into my mind to do this was using a reference transcript file and use bedtools/bedops to look for intersecting transcripts in all 12 samples. However, I hope I am overlooking some functionality in cufflinks/cuffcompare/cuffdiff to get this done more easily

tophat cufflinks rna-seq clustering pca • 6.3k views
ADD COMMENT
3
Entering edit mode
11.7 years ago
biopaw ▴ 30

CummeRbund R package may be what you need. A colleague of mine uses this (I don't), and it continues the workflow, creates a database with the outputs from Tophat/Cuffdiff and implements several plotting functions. When using the tophat suite the workflow is more tightly controlled (hence CummeRbund), which may be great for the casual user

But I can recommend that you may be better off using a STAR (if you have a puter with at least 36G RAM, you need 16G for Human index), HTseq for counting, and EdgeR (there are other good R seq packages as well). Your RNA-seq alignment would be completed in a few minutes, instead of a few hours, and you trade in the convenience of the more rigid workflow for a more flexible one in R (more work), but you can take advantage of the other Packages in Bioconductor.

With Edge, you could model the effect directly A vs B by creating a contrast A[TNT] - B[T-NT], where as in Cuffdiff, it seems you can only model direct, T vs NT for example. Then you can also perform any plot you like usein the Bioconductor tools in R, so you can then do the hierarchical clustering, PCA, MDS etc; ggplot is a nice plotting tool in R.

P

ADD COMMENT
1
Entering edit mode

Thanks, I will try STAR + HTSEQ + EdgeR next to tophat + cufflinks + cuffdiff

ADD REPLY
3
Entering edit mode
11.7 years ago
Ryan Thompson ★ 3.6k

You probably want to use cuffmerge to combine all the individual sample assemblies, and then re-run each sample using the merged assembly as a reference.

ADD COMMENT
1
Entering edit mode

That sounds like something I am looking for indeed :-)

ADD REPLY
1
Entering edit mode

Thanks Ryan, I ran cufflinks with --GTF (not --GTF-guide) with the transcripts.gtf from cuffmerge and it worked like a charm :-)

ADD REPLY

Login before adding your answer.

Traffic: 1942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6