Can I convert from PSI to RPKM?
2
0
Entering edit mode
5.7 years ago
majdabdu • 0

Hi,

I'm trying to say that gene-level expression doesn't tell you which isoforms are driving most of that expression (isoform can be significantly changed but gene-level expression isn't, or gene-level expression is increased but you don't know which isoform is driving that increase, etc.). I was using PSI to show that (PSI of one isoform is 0.9, so it's contributing 90% of the expression), but my PI thinks it might be a better idea to convert PSI to RPKM and show that for each isoform. They said it should be simple: multiply the RPKM of the gene-level expression by the PSI. e.g. if RPKM is 1500, and PSI is 0.9, then the expression of that isoform is 0.9*1500 = 1350 RPKM. Is this a sound way to do it? Isn't RPKM affected by gene length and therefore I would need to change that for each isoform? I'm a novice in bioinformatics, so please talk to me like a novice.

Thank you for your help!

splicing rna-seq gene assembly • 1.9k views
ADD COMMENT
2
Entering edit mode
5.7 years ago
shawn.w.foley ★ 1.3k

The challenge here is that RPKM is a gene-wide measurement, whereas PSI is an exon junction-wide measurement. Even though read lengths have consistently increased, RNA-seq reads are still quite short. Because we don't know the entire transcript from a single read, we can't say confidently at what proportions isoforms are present. For example, maybe exon 1 has a PSI = 0.9, and exon 10 also has a PSI = 0.9. Does that mean that 90% of transcripts have both exon 1 and exon 10? Does that mean that 81% of the time you have both exons included? Or is it a totally different proportion? We really can't know, and that's why benchmarking has consistently shown that transcript-level expression estimates are much less reliable than gene-level estimates.

That warning is for a vast majority of genes that have multiple splicing isoforms. However, if this is a gene that only has two alternative isoforms, I don't see why you couldn't take the counts (rather than the RPKM), multiply that by the PSI, then recalculate RPKM based on the size of the two isoforms.

ADD COMMENT
0
Entering edit mode

Unfortunately PSI is also used for isoform-level measures.

ADD REPLY
0
Entering edit mode

Thanks Shawn. This is a really helpful explanation. When you say that it would be okay to use counts to calculate the RPKM if the gene only has two isoforms, do you mean in the dataset or in general? If you're referring to the dataset, do you mean the whole dataset or only among significantly changed events?

ADD REPLY
0
Entering edit mode

I mean in the dataset. A gene might have two annotated isoforms, but the dataset you're analyzing could contain a novel isoform not previously annotated, in that case assuming you have two isoforms when the data indicate three would not be appropriate. Alternatively you could have 10 annotated isoforms across multiple tissues, but if none of those are expressed in your dataset then they're a moot point.

ADD REPLY
0
Entering edit mode

Makes sense, thank you!

ADD REPLY
1
Entering edit mode
5.7 years ago

Here we run into a problem with lack of standardisation: Unfortunatly PSI are also used for isoform-level quantification and not just for junction/exon-centric quantification. My response is only valid if you are sure you have isoform-level PSI.

If you have isoform-level PSI I agree with you - using a relative measure (such as PSI or Isoform Fraction) is much more informative mainly due to it being very hard to identify isoform switches if there is also a change in gene expression. Changes in gene expression and isoform switches are NOT mutually exclusive in fact ~30% of the isoform switches in the TCGA data in my R package IsoformSwitchAnalyzeR also have a gene level log2FC > 1 and many of the changes in gene expression are driven by isoform switches.

A simple example:

Exp     cond1   cond2
iso1    131.15  813
iso2    262.3   406.5

And the corresponding relative values

IF      cond1   cond2
iso1    0.33    0.66
iso2    0.66    0.33

From which is it easier to realise there is a switch?

With regards to back calculating the RPKM values from the PSI you are right that if the PSI was calculated from the RPKM in the first place you can simply do it like you suggest. But you need to check the source of the PSI values - where did you get them from?

Hope this helps

Kristoffer

ADD COMMENT
0
Entering edit mode

Thank you Kristoffer, your example gave me something to think about. In terms of the PSI values, a previous student got them using the MISO algorithm. As I understand it, this algorithm provides isoform-level PSI for alternative first and last exon events, but for most of the other events, it provides an exon-junction quantification. I guess this complicates things further. I will explain this to my PI.

ADD REPLY
0
Entering edit mode

You are wellcome :-)

MISO can produce both isoform and splice-site centric PSI so you need to be sure what you have. Alternatively using Salmon/Kallisto (for quantification) + IsoformSwitchAnalyzeR (for switch identification and analysis) is quite straightforward and as you can see here visualisation of isoform switches with predicted functional consequences are quite easy.

ADD REPLY
0
Entering edit mode

Thanks for providing these resources! IsoformSwitchAnalyzeR looks like an awesome tool that fills a huge gap in the field. I wish my data were compatible, but in most cases I have included/excluded exon/intron coordinates instead of full-length isoforms :/

ADD REPLY
0
Entering edit mode

How did you originally map/quantify your data (befor running MISO)? Maybe I can suggest a quick way to obtain transcript level quantifications ;-)

ADD REPLY
0
Entering edit mode

This was done way before I came to the lab but my understanding is that the reads were aligned with GSnap and then the BAM files were run in parallel on cufflinks to get gene-level quantification and on MISO to get exon/intron-level quantification.

ADD REPLY

Login before adding your answer.

Traffic: 3871 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6