Question

Minimum Or Optimal Rpkm Value To Find If A Transcript Is Significant

5

Entering edit mode

13.2 years ago

Prakki Rama ★ 2.7k

Hello all,

Could i please know:

Does a high RPKM value always report that the transcript is significant? How far is it reliable? If so, what could be an optimal RPKM value to pin point if a transcript is significant or not?
Are there any other parameters to reduce number of contigs from the denovo assembly and concentrate on only significant transcripts.

Thanks in advance.

rpkm • 11k views

ADD COMMENT • link updated 11.5 years ago by ThePresident ▴ 80 • written 13.2 years ago by Prakki Rama ★ 2.7k

score 2 · Answer 1 · 2013-10-06

What my lab does is we throw in ERCC spike-ins into the samples. They are poly-A sequences of known concentration. So you can look at them and if, say, samples with an RPKM of 2-10 are still behaving linearly, then it's probably safe to say that real transcripts with RPKMs that low are behaving linearly.

In my lab, with the experiments we run, and the purposes of those experiments, we've been setting a, loose cut-off at .5 RPKM, or 1, to be more stringent. But I wouldn't count on that value being necessarily applicable to your lab, or your experiments.

score 1 · Answer 2 · 2013-09-15

It really depends what you mean by significant? Reading between the lines, it seems as though you want to try to separate 'real' contigs from assembly artefacts. If that's the case, you should think carefully before discarding transcripts with a low RPKM.

There is no minimum - a contig representing a real transcript can have very low numbers of reads mapping to it, and have an extremely low RPKM. Equally, a high RPKM doesn't guarantee that the contig represents a real transcript. We often see chimeric contigs - where fragments from two or more different transcripts have been assembled into one contig. These chimeras often have high RPKM values, even though they are artefacts.

So, the answer to your question 1 is no, high RPKM does not mean you can be confident in the transcript - it isn't reliable. Thus there is no appropriate RPKM to making such a decision.

As for question 2, it really depends what you want to do with your assembled transcripts. Are you performing differential expression? Motif discovery? Are you interested in a particular set of genes?

score 1 · Answer 3 · 2013-10-06

1

Entering edit mode

11.5 years ago

ThePresident ▴ 80

Could it be safe to trace a diagram of all RPKM values (should give a normal distribution), and then say that +/-1 sigma are "average/moderately" expressed genes, up of that are highly expressed genes and down are low expressed genes. Overall, you'll have 68.2% of average expression, and 15.9% of low and 15.9% of highly expressed genes. Not really an experimental evidence (although you derive those from your data), but basically logical assumption. I doubt that throwing polyA in your RNA-seq library will give a better conclusion since those will never behave like mRNAs with all their respective complexity.

ADD COMMENT • link 11.5 years ago by ThePresident ▴ 80

1

Entering edit mode

"should give a normal distribution" <- that's a big assumption. Do you typically see that in your data? I would not bet on it.

ADD REPLY • link 11.5 years ago by Mikael Huss 4.8k

0

Entering edit mode

Honestly, yes. I don't know if others can confirms this, but I see it in my data. Of course, you have to log transform RPKM values otherwise the dispersion is enormous due to the extreme values. I've seen it also in at least one recent paper, but I just can't find the ref right now.

ADD REPLY • link 11.5 years ago by ThePresident ▴ 80

0

Entering edit mode

OK, interesting. It doesn't hold in the tissue data I am currently looking at (log FPKM values) but maybe it holds for other kinds of samples.

ADD REPLY • link 11.5 years ago by Mikael Huss 4.8k