Question

Merged.Gtf Or Combined.Gtf

3

Entering edit mode

13.9 years ago

Harshal ▴ 60

What is the difference between merged.gtf (from cuffmerge) and combined.gtf from (cuffcompare) ? does the output of cuffdiff vary for above scenarios?

rna seq cuffmerge • 7.0k views

ADD COMMENT • link updated 13.9 years ago by Damian Kao 16k • written 13.9 years ago by Harshal ▴ 60

1

Entering edit mode

What have you found out so far?

ADD REPLY • link 13.9 years ago by Egon Willighagen 5.4k

score 4 · Answer 1 · 2011-11-26

In cuffmerge, your gtf annotations are actually converted to .SAM and then assembled together with cufflinks to output a merged gtf annotations. Cuffmerge will make no assumptions about whether transcripts from separate assemblies are actually the same transcript or not.

In cuffcompare, the software will try to guess if transcripts from different annotation files are the same transcript so it can make a expression comparison. It does so by looking at coordinates the intron order (not much more detail is given on how it does it). Then it reports what it thinks is the same transcript among all the compared files in the combined.gtf as only one transcript. So combined.gtf should only contain transcripts that cuffcompare guessed to be present in all the input files.

score 2 · Answer 2 · 2011-10-01

I'm pretty sure that cufflinks developers intended to replicate the SQL terms JOIN and UNION when building up cuffmerge and cuffcompare respectively. from the cufflinks manual, regarding merging "cuffmerge produces a GTF file that contains an assembly that merges together the input assemblies", and regarding comparing "Cuffcompare reports a GTF file containing the "union" of all transfrags in each sample. If a transfrag is present in both samples, it is thus reported once in the combined gtf". so cuffmerge should be use just to JOIN cufflinks gtf output files' entries without filtering them, and cuffcompare should be used to get the UNION of those entries by removing duplicates.