My task is to repeat the DATA analysis of RNA-seq data as presented in a journal article using the tophat cufflinks pipeline.
For simplicity Ill just mention the 4 controls
The authors run cufflinks without a reference annotation on each control "to detect possible novel transcripts" --> then cuffmerge on the results --> they then say they run cufflinks again using the merged transctiprts.gtf as the reference annotation. It seems over complicated.
Cufflinks requires a .BAM file as input but cuffmerge output doesnt give a BAM file....so the only way i can see they did it is by re running cufflinks on every sample for a second time (waste of time?) except this time using the cuffmerge output as the reference annotation. This would mean re running cuffmerge again also afterward.
Surely " to detect possible novel transcripts" doesnt require running cufflinks on everything twice....I mean, isnt this the whole point of cufflinks.
Thanks in advance. Kenneth
Hi, I don't really see what is your question here. You answered "What is the purpose of running Cufflinks without a reference annotation?" yourself with that line "to detect possible novel transcripts", so its not so clear to me what you are asking for.
Also, a link to the original article would help commenting on this.