Hola Carlos,
Instead of doing simple correlation, you could model the relationship between each epigenetic signal and the expression of genes surrounding the signal. What do I mean by 'model'? I mean build a linear regression model, as follows:
lm(NearbyGene1 ~ mark1H3k27me2)
lm(NearbyGene2 ~ mark1H3k27me2)
lm(NearbyGene3 ~ mark1H3k27me2)
lm(NearbyGene4 ~ mark1H3k27me2)
...
lm(NearbyGene1 ~ mark2H3k27me2)
lm(NearbyGene2 ~ mark2H3k27me2)
...
lm(NearbyGene1 ~ mark3H3k27me2)
lm(NearbyGene2 ~ mark4H3k27me2)
You will have to set this up as a loop. To use model formulae in a loop, you can create the model equation with paste()
and then coerce it into a formula acceptable to the lm()
function with as.formula()
.
To extract information from a model, use the summary()
function - there are ways of extracting each individual value via subsetting.
The benefit of using a model is that you can also adjust for other covariates / confounding factors, for example:
lm(NearbyGene1 ~ m2H3k27me2 + TissueType)
Take a look here for other information related to linear regression models (and there's tonnes of information across the World Wide Web, too): A: Resources for gene signature creation
Kevin
Hi Kevin, This is a great answer, thank you very much. How many genes would you consider to test for the "Nearby Gene" comparisons?
You could just begin with, literally, each gene that is up- and down-stream of the H3K27 methylation site. If needed, you could extend it to include genes in a larger locus.
What do you think If I use logistic regression?
Sure, but, what are your x and y variables going into the model?