[construct a model.matrix ]
1
1
Entering edit mode
7 weeks ago
kyj222637 ▴ 10

Dear Biostars community,

I have two questions.

First, let me explain about my data.

T0 (the starting point of medication) and T4 (the time point when DNA methylation data was obtained from blood samples) are separated by a span of two years.

We coded individuals who took lithium for two years as 'cases' and those who did not as 'controls,' assigning each group an Age_group value of 1 or 0, respectively.

[data] result (data.frame)

enter image description here

The goal of this analysis is to examine whether lithium treatment could reduce the Epigenetic Age predicted from DNA methylation data.

The covariates included in the model are Age, Sex, PC1, PC2, PC3, PC4, PC5, B, CD4T, CD8T, Mono, NK, and Neutro (cell type proportions).

Initially, I simply used


lm(Epigenetic_age ~ Age_group + Age + Sex + PC1 + PC2 + PC3 + PC4 + PC5 + B + CD4T + CD8T + Mono + NK + Neutro)

to observe the association.

However, to achieve a more accurate model, I implemented model.matrix directly based on the guidelines from the following source: https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/designmatrices.html#treating-factors-that-are-not-of-direct-interest-as-random-effects.

I intended to include individual as a random effect ,therefore, I grouped the samples based on Group) Age_group_Time

enter image description here

and created the model.matrix accordingly


design <- model.matrix( ~ 0 + Group + AGE + SEX + BMI + ALCOHOL_USE + SMOKER + BD_Type + batch + Acute_Stable + Center + PC1 + PC2 + PC3 + PC4 + PC5 + B + NK + CD4T + CD8T + Mono + Neutro , data = result)

And then, I ran the lmfit and set up a contrast matrix for comparison.

fit<- lmFit(object = result$Epigenetic_Age, design, block = Subject, correlation = cor$consensus.correlation)

contrasts_Eigenetic_Age <- makeContrasts(
  R_T4vsT0=Group1_T4-Group1_T0,
  NR_T4vsT0=Group0_T4-Group0_T0,
  RvsNR=(Group1_T4-Group1_T0)-(Group0_T4-Group0_T0),
  levels = colnames(design)
)


fit_PCGrimAge <- contrasts.fit(fit, contrasts_Epigenetic_Age)
fit_PCGrimAge$coefficients

Currently, I can only obtain coefficients as output.

1. Could you advise on how I could modify my R code to derive p-values and determine statistical significance?

<h6>#</h6>

In another analysis, each individual is split into Case and Control groups without a two-year time gap.

Specifically, Case refers to individuals who took lithium over the past two years, and Control to those who took other medications within the same period.

enter image description here

I aimed to observe the effects of two years of lithium treatment on Epigenetic Age inferred from DNA methylation data.

I implemented the analysis with the following code,


lmm <- lm(Epigenetic_Age ~ AGE_Group + AGE + SEX + PC1 + PC2 + PC3 + PC4 + PC5 + B + NK + CD4T + CD8T + Mono + Neutro, data = result2)

yet I'm uncertain if it was executed correctly.

Furthermore, the results show large estimates for certain covariates (B, CD4T, CD8T, Mono, NK, Neutro).

enter image description here

Could you provide insights on how to interpret these results? Or Could you please give me an advice for more appropriate model for this analysis?

Any advice would be greatly appreciated.

Thank you.

Best

Yujin

regression lm methylation model_matrix • 733 views
ADD COMMENT
0
Entering edit mode

You're basically missing the following steps after fit_PCGrimAge <- contrasts.fit(fit, contrasts_Epigenetic_Age), I think:

fit3 <- limma::eBayes(fit_PCGrimAge)
#Choose thresholds appropriately, this is just an example.
fitsum <- summary(limma::decideTests(fit3, p.value = 0.05, adjust.method = "BH", lfc = 2))
head(fitsum)
#Choose a coefficient appropriately, this is just an example.
fittab <- limma::topTable(fit3, coef = colnames(fit3$coefficients)[1], number = Inf, adjust.method = "BH", sort.by = "p")
head(fittab)

fittab should have the p-values and log fold changes for the contrast(s) you are interested in (RvsNR I'm guessing).

ADD REPLY
0
Entering edit mode

Dear Dunois,

Thanks for your reply.

I thought I cannot use eBayes because compared to methylation array data which has many model (many CpG sites), I have only one model where dependent variable is epigenetic_age...!

Is it okay to use limma for further steps?

Many thanks,

Yujin

ADD REPLY
0
Entering edit mode

I am unfortunately not sufficiently qualified to comment on this aspect of your problem.

Gordon Smyth , the developer of limma and statistician extraordinaire is (if I am correct) still active here. Let's see if they can chime in perhaps. If you don't get a response here, I suggest also trying the Bioconductor forums and perhaps Cross Validated.

ADD REPLY
0
Entering edit mode

Thank you for your comment Dunois!

Best,

Yujin

ADD REPLY
0
Entering edit mode

You are welcome.

ADD REPLY
2
Entering edit mode
ADD COMMENT
0
Entering edit mode

Thank you so much!

Best,

Yujin

ADD REPLY

Login before adding your answer.

Traffic: 2016 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6