What is the purpose of running Cufflinks without a reference annotation?
3
1
Entering edit mode
8.4 years ago
BioinfGuru ★ 2.1k

My task is to repeat the DATA analysis of RNA-seq data as presented in a journal article using the tophat cufflinks pipeline.

For simplicity Ill just mention the 4 controls

The authors run cufflinks without a reference annotation on each control "to detect possible novel transcripts" --> then cuffmerge on the results --> they then say they run cufflinks again using the merged transctiprts.gtf as the reference annotation. It seems over complicated.

Cufflinks requires a .BAM file as input but cuffmerge output doesnt give a BAM file....so the only way i can see they did it is by re running cufflinks on every sample for a second time (waste of time?) except this time using the cuffmerge output as the reference annotation. This would mean re running cuffmerge again also afterward.

Surely " to detect possible novel transcripts" doesnt require running cufflinks on everything twice....I mean, isnt this the whole point of cufflinks.

Thanks in advance. Kenneth

cufflinks reference annotation • 3.7k views
ADD COMMENT
3
Entering edit mode

Hi, I don't really see what is your question here. You answered "What is the purpose of running Cufflinks without a reference annotation?" yourself with that line "to detect possible novel transcripts", so its not so clear to me what you are asking for.

Also, a link to the original article would help commenting on this.

ADD REPLY
3
Entering edit mode
8.4 years ago
ablanchetcohen ★ 1.2k

The first Cufflinks run is to generate a new annotation for each sample to discover novel transcripts. The Cuffmerge run is to merge together all the annotations for each individual sample to create one merged annotation of better quality. The second Cufflinks run is to quantify the transcripts based on the merged annotation file.

Yes, it is complicated, and the results will contain many false positives. More importantly, it's generally a waste of time, unless you're working on a poorly annotated genome. For well-annotated genomes like the mice, human, or drosophila genomes, you shouldn't bother trying to discover novel transcripts. Just use the most recent annotation available.

ADD COMMENT
2
Entering edit mode

My task is to repeat the DATA analysis of RNA-seq data as presented in a journal article using the tophat cufflinks pipeline.

@kennethcondon2007 does not have a choice here :-)

ADD REPLY
0
Entering edit mode
8.4 years ago
BioinfGuru ★ 2.1k

Thank you all for the replies.

The paper: http://www.nature.com/nbt/journal/v32/n9/full/nbt.3001.html

The pipeline: https://s31.postimg.org/tkcichqkb/pipeline_5.png

Our group has bundled onward so... We completed the first cufflinks run for each sample, then cuff merge and have attempted the second cufflinks run (using the transcripts.gtf file from cuffmerge as reference annotation) with the command:

cufflinks -g [path]/transcripts.gtf -b [path]/genome.fa -u --library-type fr-unstranded [path]/accepted_hits.bam

Is there any dissagreement with the command? Should -g be upper case -G? Should we remove -b option?

The runs started fine (we have 4 computers available to take 2 runs each) however they have all now failed with the following error returned:

Error: duplicate GFF ID 'CUFF.4.1' encountered! https://s32.postimg.org/7p43nd04l/Sup2.jpg

Also one while still running has been stuck at the same point for over an hour: https://s31.postimg.org/yzbl2fu2z/Lee.jpg

Again, thank you in advance. Kenneth.

ADD COMMENT
0
Entering edit mode

You should probably post this as a separate question.

ADD REPLY
0
Entering edit mode
8.4 years ago
BioinfGuru ★ 2.1k

Adding an annotation file during cuffmerge resolved the issue.

ADD COMMENT

Login before adding your answer.

Traffic: 1049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6