Hi,
I'm working with RNA-seq data and I'm trying to analise de-novo transcriptome assembly quality of mouse heart transcriptome using Transrate. I got the following results:
Contig metrics
n_seqs (the number of contigs in the assembly): 37.921
smallest (the size of the smallest contig): 201
largest (the size of the largest contig):11.914
n_bases (the number of bases included in the assembly): 27.936.062
mean_len (the mean length of the contigs):736.69
n under 200 (the number of contigs shorter than 200 bases): 0
n over 1k (the number of contigs greater than 1,000 bases long): 8.060
n over 10k (the number of contigs greater than 10,000 bases long): 3
n with orf (the number of contigs that had an open reading frame): 10.925
mean orf percent (for contigs with an ORF, the mean % of the contig covered by the ORF): 63.41
N90: 286
N70: 630
N50: 1.226
N30: 2.055
N10: 3.580
gc (% of bases that are G or C): 0.49
bases n (the number of bases that are N): 0
proportion n (the proportion of bases that are N): 0,0
Read mapping metrics
fragments (the number of read pairs provided): 12.282.839
fragments mapped (the total number of read pairs mapping): 7.036.174 (57%)
good mappings (the number of read pairs mapping in a way indicative of good assembly): 6272741 (51%)
bad mappings (the number and proportion of reads pairs mapping in a way indicative of bad assembly): 763.433
potential bridges (the number of potential links between contigs that are supported by the reads): 0
bases uncovered (the number of bases that are not covered by any reads): 7.612.386 (27%)
contigs uncovbase (the number of contigs that contain at least one base with no read coverage): 16814 (44%)
contigs uncovered (the number of contigs that have a mean per-base read coverage of < 1): 37.921
p_contigs_uncovered (the proportion of contigs that have a mean per-base read coverage of < 1): 1.0
contigs_lowcovered (the number of contigs that have a mean per-base read coverage of < 10): 37921
p_contigs_lowcovered (the proportion of contigs that have a mean per-base read coverage of < 10): 1.0
contigs_segmented (the number of contigs that have >=50% estimated chance of being segmented): 2754 (7%)
TRANSRATE ASSEMBLY SCORE: 0.1676
TRANSRATE OPTIMAL SCORE: 0.2625
TRANSRATE OPTIMAL CUTOFF: 0.1275
I would like to know:
- Are these values of quality good or bad?
- Based on your experience, which values should I consider most? Is there a protocol or best practices to evaluate de novo transcriptome assembly?
- Do you suggest another software besides Transrate?
Thank you!