Question

How to get per sample read counts/FPKM values from cufflinks?

0

Entering edit mode

8.4 years ago

EpiExplorer ▴ 90

I have couple of questions regarding cufflinks. I have done cufflinks transcripts assembly for 12 cell lines from human.And merged all the transcript.gtf files from all samples using the reference transcript file, by cuffmerge and then did cuffdiff analysis. This analysis gave 163 genes deferentially expressed from around 60k total list of genes. The output files in the cuffdiff analysis does not give per sample read count and fpkm information so I was not sure how to validate the the significant genes by comparing the read counts in each sample. Also, the publication by trapnell group, does mention that for a well annotated organism like human , mouse etc, its not important to carry out the transcript assembly if looking at the differential expression, so I followed the Alternation protocol mentioned in that paper where the alignment is done by turning off the novel splice junction detection in tophat and then running cuffdiff analysis by skipping the transcript assembly. This analysis gave 141 genes diferentially expressed but again I don't know how to get the per sample read count and fpkm data from cufflinks pipeline. The output files of cuffdiff from my first type of analysis (transcript assembly) does not give any significant differential spliced genes. Is this normal?(I did reference guided assembly and tophat was run with novel splice junctions option on).

The cufflinks pipeline, although mentioned to be robust in many papers, does not seem to be transparent in what its doing. I am wondering if I should choose some other method of differential expression analysis than cufflinks. Any suggestions on this will be helpful.

RNA-Seq • 5.0k views

ADD COMMENT • link updated 8.4 years ago by GouthamAtla 12k • written 8.4 years ago by EpiExplorer ▴ 90

0

Entering edit mode

The tuxedo pipeline is indeed not the best route to follow. I would suggest you get raw read counts from your BAM file using featureCount from subread software, and do sound statistic analysis with edgeR or limma.

ADD REPLY • link 8.4 years ago by Benn 8.3k

0

Entering edit mode

That's opinion, not an answer. If you're going to suggest an alternate method at least substantiate your claim.

ADD REPLY • link 8.4 years ago by User 59 13k

0

Entering edit mode

Yes, it was a comment not an answer, very sharp.

If you read all the questions and answers on this website which involve RPKM and sound statistics, you'll see that mostly they recommend to NOT use RPKM but raw read counts instead in combination with something like edgeR or DEseq.

But don't believe the opinion of people on this website, why should you? (I am being sarcastic in case it is not clear). There is also literature available, e.g., http://www.genomebiology.com/2013/14/9/R95

ADD REPLY • link 8.4 years ago by Benn 8.3k

0

Entering edit mode

The OP is clearly not experienced, or would not be asking the question or following an out of date paper, that most people will attempt to use whilst developing their RNA-Seq analysis skills. Asking you to substantiate your comments was an attempt to help the OP, not have a go at you.

ADD REPLY • link 8.4 years ago by User 59 13k

0

Entering edit mode

I am still waiting for some answers :( . Thanks b.nota for the comments.

ADD REPLY • link 8.4 years ago by EpiExplorer ▴ 90

score 0 · Answer 1 · 2016-06-26

0

Entering edit mode

8.4 years ago

GouthamAtla 12k

As far as I remember, the per sample read counts are available in cufflinks/cuffdiff output. They are called "tracking files".
Alternate pipelines would be carrying out gene level quantification using HTSeq-count / featureCounts and then use edgeR or DESeq2 for DE analysis.

ADD COMMENT • link 8.4 years ago by GouthamAtla 12k

0

Entering edit mode

Thank you. Yes the tracking files does have this information. But I was looking for an option get the fpkm values together for all samples in one file. I figured out that cuffquant and cuffnorm does this.

ADD REPLY • link 8.4 years ago by EpiExplorer ▴ 90

0

Entering edit mode

I tried to run cuffquant and cuffnorm on the files generated after the cufflinks transcript assembly and I could get the table files with individual samples. However, the cuffquant followed by cuffnorm does not generate fpkm values for each sample in case of alternative workflow( tophat-->cufflinks skipped--->cuffdiff) but I get the tracking files which have fpkm values for each group rather than each sample. Has anyone come across this and could please explain why does the alternative method not generate samplewise fpkm even after running cuffquant and cuffnorm?(While it does for Standard workflow files!)

ADD REPLY • link 8.4 years ago by EpiExplorer ▴ 90