Entering edit mode
10.9 years ago
skm770
▴
150
Rpkm and fpkm values vary a lot some times from 10^-6 to 10^6. I have seen people comparing methylation with RNA-seq. Since methylation usually lies between 0 and 1 people scale the RPKM between 0 and 1. I am unaware of how do people do that and was wondering if there any r packages available to do the same.
thanks
Bizar question... people compare methylation with rna-seq? Do you mean associate methylation with expression? The fact that methylation values range between 0 and 1 (not always, it depends on what metric you use) doesn't mean rna-seq data needs to be in the same range. Yes, rpkm values can vary a lot (both within and between samples). More importantly, they are not normally distributed but log-transformed rpkm (do you mean fpkm?) are so start with that. However, I think you should contact a local bioinformatician to help you out. If you still want to scale rpkm values have a look into the scale()-function of R. Its in the R base functionality so no need to install any packages.
What I meant by comparison is they see how expression is changing with RNA-seq usual ways they do it is in using box-plot with methylation and RNA-seq to see general patterns and dividing genes according to the methylation levels 10%,20%...100%.
Ok so I guess you have whole genome expression and methylation data and you want to know wich genes are epigenetically regulated? Why do you want to know that? If a gene is epigenetically regulated does that make it more interesting? I'm not just giving you a hard time, I'm trying to get things clear so you get maximum response on your question
Yes we would like to see if a gene is epigentically regulated/not or in this particular case what is the pattern of expression and methylation.
Then for each gene, perform a linear regression of methylation-values on expression values of your complete cohort. Make sure your expression values are normally distributed (log2(FPKM)). Do false discovery rate adjustment on the resulting p-values and filter on fdr-adjusted < 0.05 and with negative slope (meaning more methylation results in less expression). if you want, you can use these genes again to do hierarchical clustering.
can you explain how to "for each gene, perform a linear regression of methylation-values on expression values of your complete cohort", I'm very interesting, thanks