The transcripts.gtf file is one of the inputs to cuffdiff. This file could be either:
- the reference transcriptome from, say, iGenomes
- the merged predicted gtf file resulting from a cuffmerge (which in turn takes input from cufflinks)
The first approach can avoid using cufflinks/cuffmerge entirely. I understand that both of these are valid options?
My question is, when following a reference-based pipeline (as opposed to reference-guided or de novo), how does one decide whether it is appropriate to follow a tophat->cuffdiff approach rather than a tophat->cufflinks->cuffmerge->cuffdiff approach? What are the advantages of the former method (besides being less time-consuming), and in particular, what are the disadvantages?
Do please correct me if I'm wrong about this, but isn't this precisely the difference between a reference-based and a reference-guided pipeline? That in the former, one is not interested in transcripts not present in the reference transcriptome?
You are correct.
Ok, but then my question still stands: in a reference-based pipeline, is there any reason to include cufflinks/cuffmerge?