I have a list of genes and want to test whether the expression level of genes in this list could correlate with DNA methylation level. I verify my hypothesis in TCGA breast cancer. below is my plan
planA:
- Extract the expression and methylation level for each gene in my list. Expression can be defined as RPKM from RNA-seq data and methylation level from probe in the promoter region of this gene (from -3kb to 500bp around TSS. if there are multple probes in this region, I prefer to average these probe values as final methylation level value for this gene).
- calculating the correlation between these two data (eg. pearson correlation coefficient). if the P-value is significant I can say that there is a significant correlation between these two data.
planB:
- Calculating Z score of gene expression for each gene (z score as (value - mean normal)/SD normal).
- Calculating Z score of methylation level for each gene (z score as (value - mean normal)/SD normal). from -3kb to 500bp around TSS. if there are multple probes in this region, I prefer to average these probe values. then to calculate Z score.
- calculate the correlation coefficient just as metioned above.
which could be better? if you have suggestions please tell me.
Thanks
Standardizing the data (i.e. z-score transformation) is a linear transformation and Pearson's correlation is unaffected by linear transformation of the variables so you'll get the same result whether using the raw data or the standardized one.
Do you really need this on a global level or would per-gene comparisons work? That'd be much more meaningful.
Hi Devon,
I am looking at a similar exploration to tujuchuanli's. Would you mind explaining what you meant by a per-gene comparisons/what would that look like?
It's more likely that there's a coherent relationship between methylation and gene expression if one looks at individual genes than globally, since they relationship (think slope) won't be the same between genes and you'll probably be left with a big blob of dots and no way to coherently fit things.
Yes, I need this. What I talking about is that the expression level of genes in my list could be controlled by DNA methylation level. This is only way as far as I know (I know it from reading papers. it can be viewed in scatter plot) If you know a better way, please tell me. Thanks