I have merged transcriptome assembly of 728 accessions and compare it with Arabidopsis refernce annotation file (ftp://ftp.ensemblgenomes.org/pub/release-43/plants/gtf/arabidopsis_thaliana) using gffcompare. Its stat' output is:
gffcompare -r /media/waqas/second/Analysis/TAIR10_genome/ensemble_gtf/Arabidopsis_thaliana.TAIR10.42.gtf -o gffcompare stringtie_merged.gtf
Summary for dataset: stringtie_merged.gtf
Query mRNAs : 126199 in 32235 loci (112943 multi-exon transcripts)
(17283 multi-transcript loci, ~3.9 transcripts per locus)
Reference mRNAs : 53516 in 32046 loci (42815 multi-exon)
Super-loci w/ reference transcripts: 27093
| Sensitivity | Precision |
Base level: 100.0 | 86.5 |
Exon level: 100.0 | 66.0 |
Intron level: 100.0 | 70.0 |
Intron chain level: 99.6 | 37.8 |
Transcript level: 99.6 | 42.2 |
Locus level: 100.0 | 89.3 |
Matching intron chains: 42644
Matching transcripts: 53316
Matching loci: 32046
Missed exons: 0/192402 ( 0.0%)
Novel exons: 27127/342721 ( 7.9%)
Missed introns: 1/132525 ( 0.0%)
Novel introns: 22825/189217 ( 12.1%)
Missed loci: 0/32046 ( 0.0%)
Novel loci: 3349/32235 ( 10.4%)
Total union super-loci across all input datasets: 32235
126199 out of 126199 consensus transcripts written in gffcompare.annotated.gtf (0 discarded as redundant)
Besides that It generated five more files, I have checked the literature and found that class code 'j' reflects novel isoforms but the occurrence of j varies in gffcompare output files:
gffcompare.annotated.gtf (count of j: 1000)
gffcompare.stringtie_merged.gtf.tmap (count of j: 2807)
gffcompare.tracking (count of j: 2876)
How can I get the idea of actual number of novel isoforms?
Hi waqaskhokhar999 , I changed the topic tag of your post to 'question' , the 'tool' tag is reserved for advertising new tools and such.