Impact of ambiguous sequences during denovo transcriptome assembly on downstream rna-seq analysis
1
0
Entering edit mode
2.7 years ago
cakes0976 • 0

Hi all, bioinformatics newbie here.

Have been trying different assemblers for de novo transcriptome assembly for a non-model organism. When using soap denovo trans, with paired-end mode and default settings of insert size = 200bp, I understand that the scaffolds are constructed with ambiguous sequences to link the contigs.

My questions are:

  • How detrimental are the presence of these ambiguous sequences for downstream rna-seq analysis e.g. differential gene expression, considering these sequences will be contained in the cds.

  • How important is it to adjust the insert size parameter? Should I be optimizing it to obtain least presence of ambiguous sequences?

  • Would it be wise to use the single-end mode to prevent the inclusion of ambiguous sequences instead even though I have paired-end read information?

transcriptome assembly rna-seq soapdenovo-trans denovo • 559 views
ADD COMMENT
1
Entering edit mode
2.7 years ago
Dunois ★ 2.8k

Replying to your issues pointwise:

  • As far as I understand, the end user really doesn't have to do anything about the ambiguities in the assembly graph. It is an artifact of having adopted a genome assembler for de novo transcriptome assembly. Xie et al. (2014) do highlight how they dealt with this problem.
  • And I don't think you should be messing with the insert size in silico; it should correspond to whatever the insert size of the data happens to be.
  • If you have paired end data, use the appropriate settings, unless you happen to have merged the reads prior to assembly.

Honestly, I would recommend using Trinity or rnaSPADES (if you're resource constrained) over SOAPdenovo-Trans. Both of those produce perfectly acceptable assemblies with default parameters, even for data from non-model organisms (in my experience). Trinity has been my go-to assembler over the years, and it is extremely well supported by its developers. Not to mention the fact that it has an extensive ecosystem of downstream analysis tools and pipelines readily available.

I would also refer you to Hölzer and Marz (2019) for a recent comparison of de novo transcriptome assemblers. Table 3 from their paper might be helpful for you in choosing an assembler.

ADD COMMENT

Login before adding your answer.

Traffic: 2610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6