Minimum Or Optimal Rpkm Value To Find If A Transcript Is Significant
3
5
Entering edit mode
12.8 years ago
Prakki Rama ★ 2.7k

Hello all,

Could i please know:

  1. Does a high RPKM value always report that the transcript is significant? How far is it reliable? If so, what could be an optimal RPKM value to pin point if a transcript is significant or not?

  2. Are there any other parameters to reduce number of contigs from the denovo assembly and concentrate on only significant transcripts.

Thanks in advance.

rpkm • 11k views
ADD COMMENT
2
Entering edit mode
11.2 years ago

What my lab does is we throw in ERCC spike-ins into the samples. They are poly-A sequences of known concentration. So you can look at them and if, say, samples with an RPKM of 2-10 are still behaving linearly, then it's probably safe to say that real transcripts with RPKMs that low are behaving linearly.

In my lab, with the experiments we run, and the purposes of those experiments, we've been setting a, loose cut-off at .5 RPKM, or 1, to be more stringent. But I wouldn't count on that value being necessarily applicable to your lab, or your experiments.

ADD COMMENT
1
Entering edit mode
11.2 years ago

It really depends what you mean by significant? Reading between the lines, it seems as though you want to try to separate 'real' contigs from assembly artefacts. If that's the case, you should think carefully before discarding transcripts with a low RPKM.

There is no minimum - a contig representing a real transcript can have very low numbers of reads mapping to it, and have an extremely low RPKM. Equally, a high RPKM doesn't guarantee that the contig represents a real transcript. We often see chimeric contigs - where fragments from two or more different transcripts have been assembled into one contig. These chimeras often have high RPKM values, even though they are artefacts.

So, the answer to your question 1 is no, high RPKM does not mean you can be confident in the transcript - it isn't reliable. Thus there is no appropriate RPKM to making such a decision.

As for question 2, it really depends what you want to do with your assembled transcripts. Are you performing differential expression? Motif discovery? Are you interested in a particular set of genes?

ADD COMMENT
0
Entering edit mode

@Richard: Understood thank you. For the question 2, Yes i just wanted to focus on only a set of sequences which are reliable for further downstream analysis like differential expression analysis. My assembly seem to be fragmented alot resulting ~100's of thousands of contigs.

ADD REPLY
1
Entering edit mode
11.2 years ago
ThePresident ▴ 80

Could it be safe to trace a diagram of all RPKM values (should give a normal distribution), and then say that +/-1 sigma are "average/moderately" expressed genes, up of that are highly expressed genes and down are low expressed genes. Overall, you'll have 68.2% of average expression, and 15.9% of low and 15.9% of highly expressed genes. Not really an experimental evidence (although you derive those from your data), but basically logical assumption. I doubt that throwing polyA in your RNA-seq library will give a better conclusion since those will never behave like mRNAs with all their respective complexity.

ADD COMMENT
1
Entering edit mode

"should give a normal distribution" <- that's a big assumption. Do you typically see that in your data? I would not bet on it.

ADD REPLY
0
Entering edit mode

Honestly, yes. I don't know if others can confirms this, but I see it in my data. Of course, you have to log transform RPKM values otherwise the dispersion is enormous due to the extreme values. I've seen it also in at least one recent paper, but I just can't find the ref right now.

ADD REPLY
0
Entering edit mode

OK, interesting. It doesn't hold in the tissue data I am currently looking at (log FPKM values) but maybe it holds for other kinds of samples.

ADD REPLY

Login before adding your answer.

Traffic: 2362 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6