Hi all, bioinformatics newbie here.
Have been trying different assemblers for de novo transcriptome assembly for a non-model organism. When using soap denovo trans, with paired-end mode and default settings of insert size = 200bp, I understand that the scaffolds are constructed with ambiguous sequences to link the contigs.
My questions are:
How detrimental are the presence of these ambiguous sequences for downstream rna-seq analysis e.g. differential gene expression, considering these sequences will be contained in the cds.
How important is it to adjust the insert size parameter? Should I be optimizing it to obtain least presence of ambiguous sequences?
Would it be wise to use the single-end mode to prevent the inclusion of ambiguous sequences instead even though I have paired-end read information?