I tried to run cuffcompare using cufflink assembled transcript output and an annotation file. I was expecting to some novel transcripts to be found. But it gave this bellow output (stats)? Can anyone help what may have gone wrong?
# Cuffcompare v2.2.1 | Command line was:
#cuffcompare CLV_2_transcripts.gtf -r ano.gff3
#
#= Summary for dataset: CLV_2_transcripts.gtf :
# Query mRNAs : 53667 in 32615 loci (42061 multi-exon transcripts)
# (10524 multi-transcript loci, ~1.6 transcripts per locus)
# Reference mRNAs : 54124 in 32619 loci (42815 multi-exon)
# Super-loci w/ reference transcripts: 32494
#--------------------| Sn | Sp | fSn | fSp
Base level: 100.0 100.0 - -
Exon level: 114.2 114.6 100.0 100.0
Intron level: 100.0 100.0 100.0 100.0
Intron chain level: 98.2 100.0 100.0 100.0
Transcript level: 98.4 99.2 87.9 88.7
Locus level: 100.0 100.0 100.0 100.0
Matching intron chains: 42061
Matching loci: 32611
Missed exons: 7/193010 ( 0.0%)
Novel exons: 0/192188 ( 0.0%)
Missed introns: 1/132525 ( 0.0%)
Novel introns: 0/132525 ( 0.0%)
Missed loci: 7/32619 ( 0.0%)
Novel loci: 0/32615 ( 0.0%)
Total union super-loci across all input datasets: 32612
Why were you expecting novel transcripts to be found?; Which species is this?; which GTF guide have you used?
In addition, Cufflinks, CuffCompare, etc are outdated. Unless there is some legacy reason why you need to use these programs, you should instead be using HISAT2 and StringTie,
Sorry for incomplete information.
This is Arabidopsis species. I used the latest version of annotation (GTF guide) from Araport.
I have read the documentation of both pipelines (TopHat-Cufflink and HISAT2-Stringtie) from nature protocol. HISAT2 is definitly superior to TopHat in some benchmark reports. But I am using STAR aligner.
I think most of Cufflinks modules are still used by recent publications than stringtie and coming to CuffCompare (For which I asked the query here) is the same program used to built gffcompare which is coming separately to stringtie. Because of some library installing issue for gffcompare, I am still using the CuffCompare. I don't think this will create many problems (I may be wrong).
I am expecting some novel transcripts because the sequencing depth is very much high in case of our samples.
I am sure somewhere mistake has happed, but I couldn't able to figure out.
Isn't cuffcompare normally used to compare between 2 or more assembled transcriptome GTFs? You appear to be just comparing a single sample (to itself?). The syntax is:
Take a look at the other available options here: http://cole-trapnell-lab.github.io/cufflinks/cuffcompare/
No, Actually I am comparing my sample transcriptome (GTF) with Annotation file (GFF3 in my case). This supposed to give novel transcripts (As class code: J in output files) which are not reported before in annotation file.
Any novel transcripts would have been listed in your GTF produced by cufflinks itself, at least from my experience. Had you used your annotation file GFF3 as the guide during alignment and assembly, then you would surely have identified novel transcripts. On that note, as you used Star for alignment, the necessary tags that are required by cufflinks / cuffcompare may not have been added to your aligned BAM, i.e., tags related to strand alignment.
I would re-align using TopHat2 / HISAT2, and then go from there.
Yes, The mistake was in Assembly step. Use of reference annotation file as a guide instead of just used for quantifying known transcripts. Thanks for the help :)
No problem bro.
What did you use for mapping and how well it went?
I used STAR for mapping and it was successful with more than 85% reads align to reference genome.