What does noise look like in the case of de novo transcriptome assembly?
0
0
Entering edit mode
9.9 years ago

I've been involved with a RNA-Seq study of a non-model organism, and I have some lingering questions.

The study focused on chemotype differentiation in a non-model plant species. There's six distinct chemotypes/groups, and for each group the lab sampled three species. De novo assembly arrived at ~50k transcripts above a certain length threshold (~300k transcripts all in all), and my job was to annotate the transcripts and perform differential expression analysis. I had no information about the sequencing procedure, so I couldn't control for batch effects, and I had to compile the transcript abundance-per-sample matrix myself from a weird proprietary file format.

My question is, how does the noise process typically manifest itself in this kind of data? From the literature I gathered that if you have count data (which I didn't, I used transcripts per million to guard against unequal sequencing depth), the sequencing noise ("shot noise") can be estimated with a Poisson distribution, and the Negative Binomial is preferred to account for actual biological variation.

Does noise mean that a part of the transcript abundance estimate has a stochastic component you can never really be sure of? Or does it mean that the assembled transcripts are liable to be false in terms of the true underlying transcript sequence?

Any insight would be greatly appreciated, thanks.

RNA-Seq next-gen Assembly • 2.2k views
ADD COMMENT
0
Entering edit mode

What was used to complete the assembly? What was your DE tool of choice?

To ensure that your expression levels of a particular transcript is not just noise (in differential expression analysis) you would look at the FC relative to your controls, and also the false-discovery rate.

ADD REPLY

Login before adding your answer.

Traffic: 1377 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6