Correlation between methylation (450K) and gene expression (RNA-Seq)
1
1
Entering edit mode
9.0 years ago

Hi,

I wonder how to do a correlation between 450K methylation data and RNA-Seq data. The major issue for me is that I have several probes per gene ( in the promoter, the gene body, and 5' and 3'UTR) ; and only one expression value (from RNA-Seq data).

For example, I've 10 probes for one gene (ELF3). Several probes have a significant adj p-value other not ... and this gene is differentially expressed in my RNA-Seq data. Do I have to regroup probes by ucscRefGene_GROUP (TSS1500, TSS200, 1stExon, gene body, 5UTR, 3UTR,..) , and perform a mean p-value (that's a little bit odd ..)?

Some advice/comments/ideas?

Thanks

RNA-Seq 450K • 3.5k views
ADD COMMENT
0
Entering edit mode

Hi, did you ever find a solution to this? Please help!

ADD REPLY
1
Entering edit mode

I don't know - what is the hypothesis? Could methylation at a promoter be sufficient to reduce expression of the gene, irrespective of methylation at other sites of the same gene? I think that you could build gene-to-probes models, whereby you are regressing the probes' methylations to the expression of the gene.

lm(GeneExpression ~ probe1 + probe2 + probe3)
ADD REPLY
1
Entering edit mode
6.1 years ago

It took me a second to realize the update was a comment (so, I don't know if the original user is still having an issue). However, for the sake of discussion, I'll throw some ideas out there:

1) I would like to see consistent signal from multiple probes / sites. So, if you have a way to determine that there is a region with a consistent methylation trend at multiple nearby sites, I would use some sort of measurement for that region (although precisely what to use can vary somewhat between projects).

2) In the case of suggestion #1, you would have one value per region (rather than probe / site). However, if you are familiar with R and you have a lot of sites / probes to compare, it is possible that you may want to see if you can implement something based in C++ (to speed up the time for the separate tests).

As one possible example, you can take a look at the COHCAP code for possible ideas: https://github.com/cwarden45/COHCAP/blob/master/R/COHCAP.site.R

This includes the fastLmPure() function in RcppArmadillo, which doesn't really involve knowing Rcpp/C++ (you just have to write some sort of wrapper in R).

Also, outside of COHCAP, it also looks like you can also use the fastLm() function in a similar way as the lm() function: https://stackoverflow.com/questions/33584389/difference-between-fastlm-and-fastlmpure-functions-from-rcpparmadillo

To be clear, in COHCAP, the gene expression tests are for a limited number of regions, I don't do this for comparing expression and methylation. However, if this sounds helpful, then you can get some more information about the alt.pvalue parameter in the documentation for the COHCAP.site() and COHCAP.avg.by.island() functions:

https://www.bioconductor.org/packages/release/bioc/html/COHCAP.html

ADD COMMENT
1
Entering edit mode

It took me a second to realize the update was a comment

Yeh, it was revived by Will, who probably found this via Google. It's good to provide answers to old threads, nevertheless, as otherwise the biostar bot will bump the unanswered Q to the top of the pile.

ADD REPLY
1
Entering edit mode

Good point - hopefully, this will be useful to Will :)

ADD REPLY

Login before adding your answer.

Traffic: 1536 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6