Question

RNA Seq differential gene expression analysis

0

Entering edit mode

11.3 years ago

Onat • 0

Hi,

I am very fresh in the RNA Seq data analysis area and I have a question regarding the differential gene expression analysis. I have come up with an idea to perform differential gene expression analysis by using RPKM and/or expression values from RNA Seq Data by considering all the RPKM and/or expression value outputs for each gene in separate datasets and consider each value as one replicate. I have one sample from a drug-resistant cell line and one sample from a drug-sensitive cell line. I have usually more than one RPKM values for one gene at a sample. I was thinking to consider the RPKM value for each gene as a replicate and continue with the statistical analysis and fold change calculation. Could that be possible or logical? I appreciate your help. Thank you in advance.

RNA-Seq differential gene expression analysis RPKM • 6.6k views

ADD COMMENT • link updated 4.0 years ago by Ram 45k • written 11.3 years ago by Onat • 0

0

Entering edit mode

Thanks a lot for your reply. In the dataset I have RPKM and expression values. Do you have any suggestion to calculate the fold change for each gene by skipping the statistical test (since there are no real replicates)?

ADD REPLY • link 11.3 years ago by Onat • 0

0

Entering edit mode

You might just sum the various isoform metrics and then get the fold-change from those. Do include average expression too, since then you can filter out the huge fold-changes from lowly expressed genes.

ADD REPLY • link 11.3 years ago by Devon Ryan 105k

0

Entering edit mode

Can you be more specific please? How can I interpret the RPKM values to calculate the fold changes?

ADD REPLY • link 11.3 years ago by Onat • 0

0

Entering edit mode

The fold-change is just their ratio. So if you have an RPKM of 5 in one sample and 2 in another, then the fold change is 5/2 =2.5 (or 2/5=0.4, depending on which sample you'd want things relative to). If wanted log2 foldchanges, then just log2 transform and subtract.

ADD REPLY • link 11.3 years ago by Devon Ryan 105k

0

Entering edit mode

How can I decide on the RPKM value for each gene to calculate the fold change? Can I just use the highest RPKM values for each gene in a sample?

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 11.3 years ago by Onat • 0

0

Entering edit mode

Either just sum the isoforms or take the median.

ADD REPLY • link 11.3 years ago by Devon Ryan 105k

0

Entering edit mode

Ok thanks a lot.

ADD REPLY • link 11.3 years ago by Onat • 0

0

Entering edit mode

Actually I also know the ratio of each transcript variant of a gene. So this means I have the contribution rate of each transcript variant of a gene to the overall expression of that gene. Can I normalize the RPKM value for each transcript variant depending on its ratio and then get the overall RPKM for one gene?

For example variant 1 of gene A has 60% with RPKM value of 5. Variant 2 of gene A has 40% with RPKM value of 2. The overall RPKM value for gene A would be; (5x0.6)+(2x0.4)=3.8.

ADD REPLY • link 11.3 years ago by Onat • 0

0

Entering edit mode

Something like that should work.

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 11.3 years ago by Devon Ryan 105k

0

Entering edit mode

By this way I think the resulting RPKM value would be more reliable for each gene. Thanks for your help again.

ADD REPLY • link 11.3 years ago by Onat • 0

score 2 · Answer 1 · 2014-08-04

In the best case scenario you might get lucky and the results will correspond to changes in isoform composition between the samples, but I really wouldn't recommend bothering with such an analysis. While some genes will have a very large number of meaningfully expressed isoforms, most will only have one or a couple. So, you'll already end up not testing most genes. The ones you do will have the results dominated by noise and the fact that the RPKM values for each isoform are dependent on each other (i.e., an increase in isoform A will probably correspond to a decrease in B, which violates one of the more important premises of the statistical test I'm guessing you'll end up using). Further, even if you do find a difference, it's impossible to say if this is due to the difference in treatment or not, because you obviously have no replicates to look at. In short, I would recommend not wasting much time with a dataset like this, the experiment simply wasn't designed to give much in the way of useful output.

The best you can likely do is rank things by fold change or use a package like GFold.