strand-specific transcriptome Oases vs. CLC
2
0
Entering edit mode
10.2 years ago
wd • 0

Hi

I assembled an animal transcriptome de novo using strand-specific paired-end Illumina sequence data and the Oases/velvet software package (supporting strand specific data). Using the same sequence data, I also assembled a transcriptome using CLC software (CLC genomic workbench, not supporting strand-specific data for de novo assembly). Comparing these two transcriptomes (Oases vs CLC) for several reference genes (> 50) revealed that the CLC assembly was much better than the Oases version (e.g. in the CLC transcriptome genes were not fragmented into several contigs and a larger number of full length genes were assembled with CLC).

I understand strand-specific sequence data is very useful for measuring strand-specific expression but is it also favourable to use strand-specific information when assembling a transcriptome. A literature search couldn't make me much wiser....

Regards

Wannes

next-gen RNA-Seq Assembly • 2.9k views
ADD COMMENT
0
Entering edit mode
10.2 years ago

I've done some tests where I performed two assemblies with the same set of stranded PE data with Trinity. One specifying strandedness and the other specifying non-strandedness. Then I mapped the stranded PE reads back to see how how many reads would be mapped in mixed orientations in strandedness and non-strandedness assemblies.

Any transcript with more than 5 reads mapping in a single direction, I designated single orientation. Any transcript with more than 5 reads mapping in both directions, I designated mixed orientation.

For my libraries, I found ~25% single direction and ~1% mixed direction for stranded assembly. And ~25% single direction and ~2-3% mixed direction for unstranded assembly. So there were more reads mapped in mixed directions in the unstranded assembly.

There were also a lot less transcripts assembled in the unstranded assembly (~180k vs 210k in stranded).

I think, in terms of transcriptome assembly, for the majority of transcripts, strandedness doesn't seem to matter that much. But for a small proportion where maybe there are anti-sense transcription, you might be fusing transcripts.

ADD COMMENT
0
Entering edit mode
2.7 years ago
Lada ▴ 30

hello. I am new in Bioinformatics so I am wondering is it possible to make assembly using combined sequences (stranded and nonstranded)? I am making de novo transcriptome assembly for one speciese without a reference genome and I have stranded/directional seq data (RNA from whole individuals) and non-stranded/standard seq (RNA from particular tissue). So i was thinking to use both datasets to have "better" transcriptome but I don't know if is it possible to combine those data?

ADD COMMENT

Login before adding your answer.

Traffic: 2131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6