Dear BioStar Users,
I am using minfi to analyze the results of an EPIC array. To check differential methylation I decided to look for differentially methylated regions (DMRs) using bumphunter as recommended.
bumphunter
function returns candidate DMRs with several columns of information including value
and area
(clarification in this biostars post).
For 2 groups, value
column is the average of the coefficients of the linear model calculated for each site within the DMR while area
column is the sum of all coefficients. This can be easily checked by calculating the difference, between the 2 groups, of the mean value (M or beta) for each methylation site of the DMR and then averaging (value
column) or summing (area
column) these differences.
Thus, a positive value
indicates more methylation in the non-control group, and the opposite for a negative value
. Until here, everything is clear and makes sense.
My problem is that I have 3 groups, say control, drug A and drug B for example. But, bumphunter
still returns a unique value
and area
. How do you interpret the value
and area
of each candidate DMR? What do they mean in terms methylation levels in each group? How would you filter them to get biological information?
To answer these questions I went to check the function code but got more confused.
bumphunter
function can be accessed by typing getMethod("bumphunter", "matrix")
and you get:
function (object, ...)
{
.local <- function (object, design, chr = NULL, pos, cluster = NULL,
coef = 2, cutoff = NULL, pickCutoff = FALSE, pickCutoffQ = 0.99,
maxGap = 500, nullMethod = c("permutation", "bootstrap"),
smooth = FALSE, smoothFunction = locfitByCluster, useWeights = FALSE,
B = ncol(permutations), permutations = NULL, verbose = TRUE,
...)
{
nullMethod <- match.arg(nullMethod)
if (missing(design))
stop("design must be specified")
if (missing(pos))
stop("If object is a matrix, pos must be specified")
bumphunterEngine(object, design = design, chr = chr,
pos, cluster = cluster, coef = coef, cutoff = cutoff,
pickCutoff = pickCutoff, pickCutoffQ = pickCutoffQ,
maxGap = maxGap, nullMethod = nullMethod, smooth = smooth,
smoothFunction = smoothFunction, useWeights = useWeights,
B = B, permutations = NULL, verbose = verbose, ...)
}
.local(object, ...)
}
So we find that it is, basically, a S4 method to call the bumphunterEngine
function that does most of the job.
So, typing bumphunterEngine
you get:
function (mat, design, chr = NULL, pos, cluster = NULL, coef = 2,
cutoff = NULL, pickCutoff = FALSE, pickCutoffQ = 0.99, maxGap = 500,
nullMethod = c("permutation", "bootstrap"), smooth = FALSE,
smoothFunction = locfitByCluster, useWeights = FALSE, B = ncol(permutations),
permutations = NULL, verbose = TRUE, ...)
{
...
if (useWeights & smooth) {
tmp <- .getEstimate(mat = mat, design = design, coef = coef,
full = TRUE)
rawBeta <- tmp$coef
weights <- 1/tmp$sigma
rm(tmp)
}
else {
rawBeta <- .getEstimate(mat = mat, design = design, coef = coef,
full = FALSE)
weights <- NULL
}
...
else {
beta <- rawBeta
Index <- seq(along = beta)
}
...
comp <- computation.tots2(tab = tab, A = A)
rate2 <- comp$rate2
pvalues2 <- comp$pvalues2
tab$p.value <- pvalues1
tab$fwer <- rate1
tab$p.valueArea <- pvalues2
tab$fwerArea <- rate2
tab <- tab[order(tab$fwer, -tab$area), ]
algorithm <- list(version = packageDescription("bumphunter")$Version,
coef = coef, cutoff = cutoff, pickCutoff = pickCutoff,
pickCutoffQ = pickCutoffQ, smooth = smooth, maxGap = maxGap,
nullMethod = nullMethod, B = B, permutations = permutations,
useWeights = useWeights, smoothFunction = deparse(substitute(smoothFunction)))
ret <- list(table = tab, coef = rawBeta, fitted = beta, pvaluesMarginal = pvs,
null = list(value = V, length = L), algorithm = algorithm)
class(ret) <- "bumps"
return(ret)
}
It is a long function, but the relevant parts for this problem IMO are coef = 2
, in the parameters of the function, and .getEstimate(mat = mat, design = design, coef = coef, ...)
which calculates the coefficients.
Regarding coef = 2
, we can see that the original bumphunter
function already has it defined as defaults. Also, knowing that bumhunter was originally defined to compare 2 groups I went to check the minfi wrapper to see if they implemented a change.
setMethod(
"bumphunter",
signature(object = "GenomicRatioSet"),
function(object, design, cluster = NULL, coef = 2, cutoff = NULL,
pickCutoff = FALSE, pickCutoffQ = 0.99, maxGap = 500,
nullMethod = c("permutation", "bootstrap"), smooth = FALSE,
smoothFunction = locfitByCluster, useWeights = FALSE,
B = ncol(permutations), permutations = NULL, verbose = TRUE,
type = c("Beta","M"), ...) {
.isMatrixBackedOrStop(object, "bumphunter")
type <- match.arg(type)
bumphunterEngine(
mat = getMethSignal(object, type),
design = design,
chr = as.factor(seqnames(object)),
pos = start(object),
cluster = cluster,
coef = coef,
cutoff = cutoff,
pickCutoff = pickCutoff,
pickCutoffQ = pickCutoffQ,
maxGap = maxGap,
nullMethod = nullMethod,
smooth = smooth,
smoothFunction = smoothFunction,
useWeights = useWeights,
B = B,
permutations = permutations,
verbose = verbose,
...)
}
)
We see that the minfi wrapper basically fits the right matrix of data (mat = getMethSignal(object, type)
) to dhe bumphunterEngine
function keeping the coef = 2
option.
At this point, I went to check the other function .getEstimate(mat = mat, design = design, coef = coef, ...)
which calculates the coefficients, if I understood properly, based on the data matrix (mat = mat
), the experimental design (design = design
) and the coefficient (coef = coef
, which gets set to 2). The code can be found here.
.getEstimate <- function(mat, design, coef, B=NULL, permutations=NULL,
full=FALSE) {
v <- design[,coef]
A <- design[,-coef, drop=FALSE]
qa <- qr(A)
S <- diag(nrow(A)) - tcrossprod(qr.Q(qa))
vv <- if (is.null(B)) matrix(v, ncol=1) else{
if(is.null(permutations)){
replicate(B, sample(v))
} else{
apply(permutations,2,function(i) v[i])
}
}
sv <- S %*% vv
vsv <- diag(crossprod(vv,sv))
b <- (mat %*% crossprod(S, vv)) / vsv
if(!is.matrix(b))
b <- matrix(b, ncol = 1)
if (full) {
sy <- mat %*% S
df.residual <- ncol(mat) - qa$rank - 1
if (is.null(B)) {
sigma <- matrix(sqrt(rowSums((sy - tcrossprod(b, sv))^2) / df.residual), ncol=1)
} else {
sigma <- b
tmp <- sy
for (j in 1:B) {
tmp <- tcrossprod(b[,j], sv[,j])
sigma[,j] <- rowSums((sy-tmp)^2)
}
sigma <- sqrt(sigma/df.residual)
}
out <- list(coef=b,
sigma=sigma,
stdev.unscaled=sqrt(1/vsv),
df.residual=df.residual)
if (is.null(B)) out$stdev <- as.numeric(out$stdev)
} else {
out <- b
}
return(out)
}
and my design would be something like:
des.groups <- c(rep("control", 2), rep("drug.A", 2), rep('drug.B', 2))
design <- model.matrix(~ des.groups)
design
(Intercept) des.groupsdrug.A des.groupsdrug.B
1 1 0 0
2 1 0 0
3 1 1 0
4 1 1 0
5 1 0 1
6 1 0 1
I am worried that, just at the beggining of the .getEstimate
function, it subsets the design based on the coef
argument
v <- design[,coef]
getting
> v
1 2 3 4 5 6
0 0 1 1 0 0
However this is as far as I can go with my linear algebra knowledge. So I am wondering if the value
and area
returned by bumphunter
reflect the mehylation changes of "just" 2 groups (in my case control (Intercept) and des.groupsdrug.A).
What do you think? If this is the case, how would you address the comparison among 3 groups? with 3 pairwise comparisons?
If you arrived here, THANKS for reading the long post and please share your thoughts.