Question

cuffcompare: No novel genes found. Is this possible?

0

Entering edit mode

6.7 years ago

sangram_keshari ▴ 260

I tried to run cuffcompare using cufflink assembled transcript output and an annotation file. I was expecting to some novel transcripts to be found. But it gave this bellow output (stats)? Can anyone help what may have gone wrong?

# Cuffcompare v2.2.1 | Command line was:
#cuffcompare CLV_2_transcripts.gtf -r ano.gff3
#

#= Summary for dataset: CLV_2_transcripts.gtf :
#     Query mRNAs :   53667 in   32615 loci  (42061 multi-exon transcripts)
#            (10524 multi-transcript loci, ~1.6 transcripts per locus)
# Reference mRNAs :   54124 in   32619 loci  (42815 multi-exon)
# Super-loci w/ reference transcripts:    32494
#--------------------|   Sn   |  Sp   |  fSn |  fSp  
    Base level:     100.0   100.0     -       - 
    Exon level:     114.2   114.6   100.0   100.0
  Intron level:     100.0   100.0   100.0   100.0
Intron chain level:      98.2   100.0   100.0   100.0
Transcript level:    98.4    99.2    87.9    88.7
   Locus level:     100.0   100.0   100.0   100.0

 Matching intron chains:   42061
          Matching loci:   32611

      Missed exons:       7/193010  (  0.0%)
       Novel exons:       0/192188  (  0.0%)
    Missed introns:       1/132525  (  0.0%)
     Novel introns:       0/132525  (  0.0%)
       Missed loci:       7/32619   (  0.0%)
        Novel loci:       0/32615   (  0.0%)

Total union super-loci across all input datasets: 32612

RNA-Seq cuffcompare • 2.4k views

ADD COMMENT • link 6.6 years ago by sangram_keshari ▴ 260

0

Entering edit mode

Why were you expecting novel transcripts to be found?; Which species is this?; which GTF guide have you used?

In addition, Cufflinks, CuffCompare, etc are outdated. Unless there is some legacy reason why you need to use these programs, you should instead be using HISAT2 and StringTie,

ADD REPLY • link 6.7 years ago by Kevin Blighe 88k

0

Entering edit mode

Sorry for incomplete information.

This is Arabidopsis species. I used the latest version of annotation (GTF guide) from Araport.

I have read the documentation of both pipelines (TopHat-Cufflink and HISAT2-Stringtie) from nature protocol. HISAT2 is definitly superior to TopHat in some benchmark reports. But I am using STAR aligner.

I think most of Cufflinks modules are still used by recent publications than stringtie and coming to CuffCompare (For which I asked the query here) is the same program used to built gffcompare which is coming separately to stringtie. Because of some library installing issue for gffcompare, I am still using the CuffCompare. I don't think this will create many problems (I may be wrong).

I am expecting some novel transcripts because the sequencing depth is very much high in case of our samples.

I am sure somewhere mistake has happed, but I couldn't able to figure out.

ADD REPLY • link 6.7 years ago by sangram_keshari ▴ 260

0

Entering edit mode

Isn't cuffcompare normally used to compare between 2 or more assembled transcriptome GTFs? You appear to be just comparing a single sample (to itself?). The syntax is:

cuffcompare [options]* <cuff1.gtf> [cuff2.gtf] … [cuffN.gtf]

Take a look at the other available options here: http://cole-trapnell-lab.github.io/cufflinks/cuffcompare/

ADD REPLY • link 6.7 years ago by Kevin Blighe 88k

0

Entering edit mode

No, Actually I am comparing my sample transcriptome (GTF) with Annotation file (GFF3 in my case). This supposed to give novel transcripts (As class code: J in output files) which are not reported before in annotation file.

ADD REPLY • link 6.7 years ago by sangram_keshari ▴ 260

1

Entering edit mode

Any novel transcripts would have been listed in your GTF produced by cufflinks itself, at least from my experience. Had you used your annotation file GFF3 as the guide during alignment and assembly, then you would surely have identified novel transcripts. On that note, as you used Star for alignment, the necessary tags that are required by cufflinks / cuffcompare may not have been added to your aligned BAM, i.e., tags related to strand alignment.

I would re-align using TopHat2 / HISAT2, and then go from there.

ADD REPLY • link 6.7 years ago by Kevin Blighe 88k

0

Entering edit mode

Yes, The mistake was in Assembly step. Use of reference annotation file as a guide instead of just used for quantifying known transcripts. Thanks for the help :)

ADD REPLY • link 6.6 years ago by sangram_keshari ▴ 260

0

Entering edit mode

No problem bro.

ADD REPLY • link 6.6 years ago by Kevin Blighe 88k

0

Entering edit mode

What did you use for mapping and how well it went?

ADD REPLY • link 6.6 years ago by lakhujanivijay 5.9k

0

Entering edit mode

I used STAR for mapping and it was successful with more than 85% reads align to reference genome.

ADD REPLY • link 6.6 years ago by sangram_keshari ▴ 260

score 2 · Accepted Answer · 2018-06-08

Okay, that was a very novice kind mistake on my side. Now that I figured it out. Just let me share it here, in case if someone faces the same.

It was in the assembly step (Using Cufflink), that I used the option -G/--GTF (which simply quantitating against reference transcript annotations) instead of -g/--GTF-guide (which use reference transcript annotation to guide assembly).

Now I able to find the novel elements in subsequent steps of the Cufflink package.