Hi,
I am very fresh in the RNA Seq data analysis area and I have a question regarding the differential gene expression analysis. I have come up with an idea to perform differential gene expression analysis by using RPKM and/or expression values from RNA Seq Data by considering all the RPKM and/or expression value outputs for each gene in separate datasets and consider each value as one replicate. I have one sample from a drug-resistant cell line and one sample from a drug-sensitive cell line. I have usually more than one RPKM values for one gene at a sample. I was thinking to consider the RPKM value for each gene as a replicate and continue with the statistical analysis and fold change calculation. Could that be possible or logical? I appreciate your help. Thank you in advance.
Thanks a lot for your reply. In the dataset I have RPKM and expression values. Do you have any suggestion to calculate the fold change for each gene by skipping the statistical test (since there are no real replicates)?
You might just sum the various isoform metrics and then get the fold-change from those. Do include average expression too, since then you can filter out the huge fold-changes from lowly expressed genes.
Can you be more specific please? How can I interpret the RPKM values to calculate the fold changes?
The fold-change is just their ratio. So if you have an RPKM of 5 in one sample and 2 in another, then the fold change is 5/2 =2.5 (or 2/5=0.4, depending on which sample you'd want things relative to). If wanted log2 foldchanges, then just log2 transform and subtract.
How can I decide on the RPKM value for each gene to calculate the fold change? Can I just use the highest RPKM values for each gene in a sample?
Either just sum the isoforms or take the median.
Ok thanks a lot.
Actually I also know the ratio of each transcript variant of a gene. So this means I have the contribution rate of each transcript variant of a gene to the overall expression of that gene. Can I normalize the RPKM value for each transcript variant depending on its ratio and then get the overall RPKM for one gene?
For example variant 1 of gene A has 60% with RPKM value of 5. Variant 2 of gene A has 40% with RPKM value of 2. The overall RPKM value for gene A would be; (5x0.6)+(2x0.4)=3.8.
Something like that should work.
By this way I think the resulting RPKM value would be more reliable for each gene. Thanks for your help again.