How would sequence overrepresentation and duplication level affect the quality of a De Novo transcriptome assembly?

0

Entering edit mode

9.7 years ago

guillermo.ponz.segrelles ▴ 30

Hello everyone,

I am trying to prepare two files containing several millions of illumina RNA pair-end reads for a De Novo assembly using Trinity, and, as I posted the other day I have some doubts about how to prepare the datasets in order to obtain the best transcriptome assembly.

In this case my doubt is about haw would affect the assembly the overrepresentation of some sequences. My datasets have a deep coverage and, as a result, I have a great overrepresentation of some (non-artifact) sequences (some of them representing up to the 0.2% of the total number of sequences) and a huge level of sequence duplication (73% aprox.). Are this parameters important for the quality of the assembly? How can I solve this if it is important? Should I normalized the datasets before performing the assembly?

I would be very grateful if someone can help me with this (at least for me) puzzling issue.

next-gen Assembly Trinity RNA-Seq • 2.4k views

ADD COMMENT • link updated 21 months ago by Ram 44k • written 9.7 years ago by guillermo.ponz.segrelles ▴ 30

0

Entering edit mode

Check Is It Safe To Remove Exact Duplicate Reads In The Denovo Transcriptome Assembly?

ADD REPLY • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by Nestor Wendt ▴ 100

Login before adding your answer.