Question

Correlation between methylation's regression values and Gene expression's logFC

2

Entering edit mode

7.4 years ago

Bioinformatist Newbie ▴ 270

Hi,

I have a dataframe where I have Gene Names, regression estimates (for 5mC methylation data: a positive estimate would indicate hypermethylation, while a negative estimate would indicate hypomethylation in the disease group. These estimates are averaged at gene level, initially I had these values for each CpG site) and logFC computed by limma (positive value means genes are up-regulated in disease, negative values means they are down-regulated in diseased state). This is how my dataframe looks like:

> data[1:3,]
  Gene    Reg_Beta       logFC
1 A1BG 0.012759505 -0.01594659
2 A1CF 0.003407954  0.01044036
3  A2M 0.004816774  0.37067536

Can anybody guide me if I can obtain correlation between Reg_Beta (avg. beta value for methylation status of a gene) and logFC (expression value of that gene) at gene level? So that at the end I can get those genes for which I can say they are highly anti-correlated to gene expression.

I am a newbie to methylation analysis, any constructive suggestion or comment will be highly appreciated! Thanks.

5mC correlation methylation geneExpression • 2.2k views

ADD COMMENT • link 7.4 years ago by Bioinformatist Newbie ▴ 270

1

Entering edit mode

For a correlation you'll need more than only one data point per group. You have for each gene Group A: Reg_beta (one value) and Group B: logFC (one value). For proper correlation you need a set of points for both A and B.

ADD REPLY • link 7.4 years ago by Benn 8.3k

1

Entering edit mode

Thank you for your comment. If I consider the original beta values (averaged per gene level) for disease group and similarly for healthy group and then I add log2 normalized expression values for diseased and healthy samples (at gene level) then how would I get what I am looking for. For example say col1 will be gene name, col2:35 are beta value of diseased samples, col 36:70 are beta value of healthy samples, col 71:91 are expression values for diseased sample and col 92:102 are expresion values for healthy samples. Can you guide me how will I design the comparisons in this case so that the results make sense and I get what I am looking for.

ADD REPLY • link 7.4 years ago by Bioinformatist Newbie ▴ 270

1

Entering edit mode

For correlation you'll need the same number of values in Group A as in Group B, and they need to be paired, this pairing needs to be meaningful (not random).

I am not sure what your Reg_beat values are, and how they link to your expression values. Are these paired?

If you make a plot with A in x-axis and B in y, then each value needs to be paired and becomes one point. The correlation is then how all these points fit a line.

Hope this helps, if not please read e.g., wiki about correlation.

ADD REPLY • link 7.4 years ago by Benn 8.3k