Hi everybody,
My question is: what is the origin of broken paired reads? I'm challenging with the de novo transcriptome assembly using CLC genomics workbench software. At the first, I did trimming and duplicate read removal and then assembly using this software. My quality of sequencing data (Illumina 100bp, paired read) seems very good, in fact about 97% of reads passed the trimming steps successfully and there was only 1% duplicate read in my total reads. However, my assembly result is not satisfactory (about 330 million reads from 370 million reads was reported as broken paired reads!). I would be highly appreciate if you could let me know why there is many broken reads and how I can solve them? thanks in advance.
If you're using a commercial product like CLC, then ask them questions like this. You are paying for support from them after all.