What is correct percentage of sense-stranded transcripts in directional de novo assemblies ?
0
0
Entering edit mode
21 months ago
Lada ▴ 30

Hi guys,

I have a question related to the quality check of my RNAseq data.

Background: I isolated the RNA (5 different species, triplicates, 3 individuals pooled in each sample), and sent it for sequencing in China where they did library prep, sequencing and some downstream data analysis including assembly and CDS prediction. The important thing is that it was supposed to be DIRECTIONAL LIBRARY PREP AND ASSEMBLY ASSESSMENT (paired-end, 2x150 bp).

Issue: I looked at the CDS prediction results provided by the company and realised that the Transdecoder output is 50% for (+) strand sequences and 50% for (-) strand sequences. note 1: I checked with the sequencing company which library prep kit they used and they confirmed it's an NEB kit, RF stranded. note2: I don't have their Trinity code, just the final assembly. I just know some basic parameters such as kmer size, kmer_cov, min_glue and contig_lenght, but no mention of strandedness (--SS_lib_type parameter) so I can just BELEIVE they used --SS_lib_type RF flag. If this flag is not used, Trinity will think that the input data is non-stranded and will make a nondirectional assembly.

What I did: to make additional checks, I made my own de novo assemblies with Trinity where I was sure that I used RF flag for stranded library type and then I did Transdecoder and got the following: approx 70% (+) strand and 30% (-) strand.
note 1: I didn't do a Corset step so my Transdecoder is on Trinity transcripts, not clusters. But I guess a would get similar percentage on cluster level...

Question: What I learned so far is that if the strand-specific assembly was made, orfs from Transdecoder will largely show up on the sense-strand of the transcripts (+) and this should be above 90%. Is that correct?
From the results that I just showed you I conclude that either the assemblies that were made by the outsourced company were not made as strand-specific (50/50 %) and additionally maybe the library preparation was not successful (which could be explained why I am getting only 70% sense-strand ORFs when I made stranded assembly on my own).

Hope I explained well and please excuse me if I said some nonsense, still learning Bioinformatics. :)

Lada

Trinity strand-specificity RNA-seq Transdecoder • 580 views
ADD COMMENT
0
Entering edit mode

Forgot to mention - each species is a separate de novo assembly, of course. Actually, it is not very important, I just mentioned the number of species/assemblies to point out that I see a similar problem repeated in every case so it's not a coincidence or an isolated example.

ADD REPLY

Login before adding your answer.

Traffic: 2334 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6