Is there any differences between tophat, cufflinks command with and without GTF file?
1
0
Entering edit mode
9.0 years ago

Dear All,

I have a query regarding the gene annotation file (GTF).

  1. Tophat command without GTF:

    $ tophat -p 8 --library-type fr-firststrand -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq
    
  2. Tophat command with GTF:

    $ tophat -p 8 --library-type fr-firststrand <code>-G genes.gtf</code> -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq
    

What is the difference between the two tophat commands?

  1. Cufflinks command without GTF:

    $ cufflinks -p 8 -o cufflinks_out tophat_out/accepted_hits.bam
    
  2. Cufflinks command with GTF:

    $ cufflinks -p 8 <code>-G gene.gtf</code> -o cufflinks_out tophat_out/accepted_hits.bam
    

What is the difference between the two cufflinks commands?

Scenario 1: (Tophat command without GTF and Cufflinks command with GTF)

$ tophat -p 8 --library-type fr-firststrand -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq
$ cufflinks -p 8 -G gene.gtf -o cufflinks_out tophat_out/accepted_hits.bam

Scenario 2: (Tophat command with GTF and Cufflinks command without GTF)

$ tophat -p 8 --library-type fr-firststrand -G genes.gtf -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq
$ cufflinks -p 8 -o cufflinks_out tophat_out/accepted_hits.bam

Scenario 3: (Tophat command with GTF and Cufflinks command with GTF)

$ tophat -p 8 --library-type fr-firststrand -G genes.gtf -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq
$ cufflinks -p 8 -G genes.gtf -o cufflinks_out tophat_out/accepted_hits.bam

Scenario 4: (Tophat command without GTF and Cufflinks command without GTF)

$ tophat -p 8 --library-type fr-firststrand -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq
$ cufflinks -p 8 -o cufflinks_out tophat_out/accepted_hits.bam

What is the difference between scenario1, scenario2, scenario3 and scenario4?

Does the output of scenario1, scenario2, scenario3 and scenario4 are same or different?

cufflinks tophat rna-seq RNA-Seq • 3.8k views
ADD COMMENT
1
Entering edit mode

Have you read the manual?

ADD REPLY
0
Entering edit mode

Hi Devon,

I read the manual, but still I was not clear.

ADD REPLY
0
Entering edit mode

Did Chirag's reply clarify things?

ADD REPLY
0
Entering edit mode

Hi Devon,

I have a better understanding now.

The reason why I have 4 different scenarios is, I have seen from different posts that people use these different combinations.

I am currently running all these 4 different combinations in my system. As of now I dint see my results.

So, I would like to know what should I expect from the output files of above 4 scenarios.

ADD REPLY
0
Entering edit mode

In general, if your organism has a decent annotation then you'll get better results if you use it.

ADD REPLY
3
Entering edit mode
9.0 years ago
Chirag Parsania ★ 2.0k

Hi,

Find your few of the answers below

What is the difference between the two tophat commands?

When you run tophat with gtf file first it will build transcriptome by reading the information from gtf file. Then it will do alignment with transcriptome and not whole genome. Once it finishes alignment with transcriptome remaining reads it will align with genome. That's how your alignment will be faster and it's a kind of guided alignment

What is the difference between the two tophat commands?

Again answer is the same as I mentioned above. It will guide cufflink to build assembly. In your final output you will have both things known as well novel transcripts built from your data.

Please refer this http://cole-trapnell-lab.github.io/cufflinks/cufflinks/index.html

Hope other two you can solve by yourself

Cheers,
Chirag

ADD COMMENT
0
Entering edit mode

Thanks Chirag for your explanation.

  1. Tophat command without GTF: Align the reads directly to reference genome. Generated accepted_hits.bam file will consider all mapping as novel exon-exon junctions.

  2. Tophat command with GTF: Based on GTF file a junction database is created. Then TopHat will align reads that do not map within an exon against the junction database to identify spliced read alignments. If the alignment is still not found in junction DB it will consider as novel exon-exon junction. Generated accepted_hits.bam file will have two mappings one is spliced based on GTF and novel exon-exon junction.

I am clear with tophat now. But I have a doubt in cufflinks -G GTF and -g GTF?

ADD REPLY
1
Entering edit mode

I think cufflink only has the -g option. Basically, what cufflinks try to perform was to try to build a transcript GTF file based on your data. Without the -g option, cufflinks will assemble the transcript based only on your reads. With the GTF file, it will perform a guided assembly, kind of like performing denovo assembly with a reference genome.

ADD REPLY
0
Entering edit mode

Hi Sam,

Thanks for your explanation. I am getting it.

Does the output from cufflinks with GTF and without GTF differ?

I have a GTF file for mouse. Then which of the above scenarios should be used for my analysis?

ADD REPLY
0
Entering edit mode

I can ensure you that you get a completely different output. Probed

ADD REPLY
1
Entering edit mode

Yes, most likely a different output will be generated. If you are working on mouse, use the mouse GTF so that you can perform the guided assembly.

ADD REPLY

Login before adding your answer.

Traffic: 2096 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6