I wonder if somebody saw similar number of contigs? Do you consider them as real splicing variants or assembling errors? The Trinity assembly was made from HiSeq 2x150 paired-end reads, ~110 mammalian brain samples. Totally ~ 6.3 total billions read pairs, 1,907,297 Mbases. I can't say how many reads were discarded after trimming. But the mean read quality doesn't seem unusual: % of >= Q30 Bases: 90.88; Quality Score: 37.96 Trinity parameters were default, i.e. included "insilico_read_normalization.pl --max_cov 50" Here are Trinity assembly metrics:
n_seqs 3236542
smallest 201
largest 20360
n_bases 2340786001
mean_len 723.23671
n_under_200 0
n_over_1k 642187
n_over_10k 609
n_with_orf 222378
mean_orf_percent 34.14792
n90 291
n70 594
n50 1136
n30 1954
n10 3685
gc 0.45103
bases_n 0
proportion_n 0
Thanks, Vlad
Trinity FAQ #1. Have been asked time and time again. That said, 3 million contigs is really a lot, it is a lot more than the "a lot" I have usually observed - in the range of 100-500 thousands. I've found the ExN50 to be really useful, particularly this part: