I'm trying to assemble de novo cDNA reads from transcriptome of about 250.000 reads sequenced by 454 from a specie of Begonia using Newbler but actually the results aren't really good... . I would like to know, if there are someone who used to run Newbler, which parameters could I change in order to improve my results. We want to assembly again because in next steps we did a BLAST of isotigs from each isogroup against the genome and that showed that ~50% of isogroups with >1 isotig matched different genome contigs, suggesting these were incorrectly assigned as alternative transcripts. So, we would like to know which parameters we can change to try to improve our assembly for that transcriptome.
That are the parameters I used:
Input:: minimum read length: 20 Computation:: seed step:12 seed length:16 seed count:1 minimum overlap length:40 minimun overlap identity:90 alignment identity score: 2 alignment difference score:-3 Selecting Using duplicate rads and extend low depth overlaps too.
Thanks for all!
I would start with increasing the alignment identity. Currently you require 90% nuc id over at least 40bp, which allows for 4 errors in 40. You could probably be more strict with modern 454 data.