Question

strand-specific transcriptome Oases vs. CLC

0

Entering edit mode

10.2 years ago

wd • 0

Hi

I assembled an animal transcriptome de novo using strand-specific paired-end Illumina sequence data and the Oases/velvet software package (supporting strand specific data). Using the same sequence data, I also assembled a transcriptome using CLC software (CLC genomic workbench, not supporting strand-specific data for de novo assembly). Comparing these two transcriptomes (Oases vs CLC) for several reference genes (> 50) revealed that the CLC assembly was much better than the Oases version (e.g. in the CLC transcriptome genes were not fragmented into several contigs and a larger number of full length genes were assembled with CLC).

I understand strand-specific sequence data is very useful for measuring strand-specific expression but is it also favourable to use strand-specific information when assembling a transcriptome. A literature search couldn't make me much wiser....

Regards

Wannes

next-gen RNA-Seq Assembly • 2.9k views

ADD COMMENT • link updated 2.7 years ago by Lada ▴ 30 • written 10.2 years ago by wd • 0

Ram · Answer 1 · 2014-09-09

I've done some tests where I performed two assemblies with the same set of stranded PE data with Trinity. One specifying strandedness and the other specifying non-strandedness. Then I mapped the stranded PE reads back to see how how many reads would be mapped in mixed orientations in strandedness and non-strandedness assemblies.

Any transcript with more than 5 reads mapping in a single direction, I designated single orientation. Any transcript with more than 5 reads mapping in both directions, I designated mixed orientation.

For my libraries, I found ~25% single direction and ~1% mixed direction for stranded assembly. And ~25% single direction and ~2-3% mixed direction for unstranded assembly. So there were more reads mapped in mixed directions in the unstranded assembly.

There were also a lot less transcripts assembled in the unstranded assembly (~180k vs 210k in stranded).

I think, in terms of transcriptome assembly, for the majority of transcripts, strandedness doesn't seem to matter that much. But for a small proportion where maybe there are anti-sense transcription, you might be fusing transcripts.

score 0 · Answer 2 · 2022-03-14

hello. I am new in Bioinformatics so I am wondering is it possible to make assembly using combined sequences (stranded and nonstranded)? I am making de novo transcriptome assembly for one speciese without a reference genome and I have stranded/directional seq data (RNA from whole individuals) and non-stranded/standard seq (RNA from particular tissue). So i was thinking to use both datasets to have "better" transcriptome but I don't know if is it possible to combine those data?