Question

Correlating DNA methylation beta values with Gene expression levels (RNA-seq)

1

Entering edit mode

2.6 years ago

sswang25 ▴ 20

I am currently trying to write code in R to analyse the relationship between DNA methylation values at probe sites and gene expression levels. I have DNA methylation data generated from the Illumina Epic array for 50 patients. I also have these patients normalized RNA-seq gene expression levels in VST form. I have 2 dataframes:

DF 1 contains the beta values for all the methylation probes for all 50 patients. Patients are columns and probes are rows

DF 2 contains the VST data for all 50 patients - patients are columns and genes are rows

I would like to test the correlation (just using spearmans rank) between each individual probe methylation value and each individual gene's expression level for all the patients.

I hope to produce a matrix of R values for each probe/gene combination which I can then use to plot either a volcano plot or MA plot.

I am very new to R and therefore I don't know how to code something like this. The reason I am doing every single correlation is because I am hoping that this catch all approach might provide previously unknown relationships.

correlation methylation DNA RNA-seq R • 1.8k views

ADD COMMENT • link updated 2.5 years ago by Hamid Ghaedi 3.3k • written 2.6 years ago by sswang25 ▴ 20

score 1 · Answer 1 · 2022-05-03

In a similar situation, I would suggest limiting the number of probes and genes considering some metrics like being differentially methylated or being differentially expressed. Otherwise, you will end up in a situation doing a test thousands/millions of times! In your case, if you include all probes and all genes then you would need at least 16,000,000,000 times (800 K probes x 20 K genes) to do the correlation analysis.
I have prepared a tutorial for doing different types of analysis on methylation data and they are available here and here. Check them out. You may find the section "combined analysis" and I guess that is kind of what you are looking for.

Anyway to answer your question and give you some insight on code; [UPDATED on 2022-05-16]

# toy version of the input data
DF1 = data.frame(pat1meth = runif(10, min=0, max=1),
                 pat2meth = runif(10, min = 0, max = 1),
                 pat3meth = runif(10, min = 0, max = 1),
                 row.names = paste0('probe', seq(1:10)))

DF2 = data.frame(pat1exp = runif(10, min=5, max=15),
                 pat2exp = runif(10, min = 5, max = 25),
                 pat3exp = runif(10, min = 5, max = 30),
                 row.names = paste0('gene', seq(1:10)))


# creating an empty dataframe to store for loops results
corRes = data.frame(probe=character(0),gene=character(0), pval=numeric(0), cor=numeric(0))

for(p in 1:nrow(DF1)){ # DF1 contains methylation data and probes are in rows
        pr = as.numeric(DF1[p,])
        probe = rownames(DF1)[p]
        print(probe)
        for(g in 1:nrow(DF2)){ # DF2 contains expression data and genes are in rows
                gene = rownames(DF2)[g]
                print(gene)
                ge = as.numeric(DF2[g,])
                # corelation calculation
                res <- cor.test(pr,ge, method = "spearman")
                # saving pvalue
                pval = round(res$p.value, 4)
                # corr coeff
                cor = round(res$estimate, 4)
                # putting data into the result dataframe
                idx = nrow(corRes)+1
                corRes[idx,] <- c(probe,gene, pval, cor)
        }
}

Result

corRes
      probe   gene   pval  cor
1    probe1  gene1      1 -0.5
2    probe1  gene2 0.3333   -1
3    probe1  gene3      1 -0.5
4    probe1  gene4      1 -0.5
5    probe1  gene5      1 -0.5
6    probe1  gene6      1 -0.5
7    probe1  gene7      1  0.5
8    probe1  gene8      1 -0.5
9    probe1  gene9      1  0.5
10   probe1 gene10      1  0.5
11   probe2  gene1      1  0.5
12   probe2  gene2      1 -0.5
13   probe2  gene3      1  0.5
14   probe2  gene4      1  0.5
15   probe2  gene5      1  0.5
16   probe2  gene6      1  0.5