Dear all,
I am trying to perform methylation-gene expression correlation analysis in R.
After removing probes not mapped to any gene symbol, around 300000 probes remained.
Considering the large number of samples (280), the correlation analysis generates a huge adjacency matrix that it is impossible to be opened in excel.
Since there are several probes corresponding to each gene in methylation data, I have to perform all against all pairwise correlations and then filter out the results.
I searched for a way to filter out significant negative correlations, however after a couple of hours it is still running. I have already increased the memory of R using memory.limit() function.
Is there any way to do this task in my laptop (with 16G Ram) ? (I do not have any access to computer server right now)
I would appreciate any help
Nazanin
Hi Kevin, I could solve the problem in R a few weeks ago, however I could find only one significant negative correlation! It is strange, isn't it?
Yes, it would seem strange, as methylation is supposed to decrease gene expression (?)
I used log transformed form of htseq-count data for gene expression and B values of methylation for corresponding genes for pearson correlation in R. Then I used <= - 0.5 for selecting significant negative correlations. When previously I had compared deregulated expressed genes and demethylated genes, I found a few genes with inverse negative correlation (up-regulated-hypo methylated or down-regulated-hyper methylated), but I could find any of those genes in the correlation analysis!
Maybe encode the methylation as binary (
methylated
|not methylated
) and then do binary logistic regression with each gene?You normalised the HTseq data, correct?
I am not familiar with this, but try to find a way to do binary logistic regression