I repeatedly read that LIMMA is performing an ANOVA analysis, apparently always in the background, and also that the ANOVA results are obtainable. Unfortunately, though, I can't get/find this information and would appreciate it a lot if someone could enlighten me.
Below, I post the code I use to get my differential expression analysis from LIMMA. Its a shortened version, as I also calculate differences between all of my groups, but I hope its obvious that I have 6 groups (MB, PM, MC, MM, B, and PMN) and calculate the differential expression between all of them (these are developmental stages of neutrophil granulocytes, a white blood cell type).
It would be great to know how i could get an ANOVA, in the sense of a table with proteins (as this is a proteome analysis), are differentially expressed between all the groups. Currently, I am doing this with extra analysis.
Thanks a lot! Sebastian
# copy data
san <- san_gnp_pro # sample annotation
exp_4DE <- exp_gnp_pro_vsn # expression table (VSN normalized counts)
# match order of san
san <- san[match(names(exp_4DE), san$id),]
# design
# target: san$pop_gen
design_popGen <- model.matrix( ~0 + pop_gen, data = san)
# contrast:
contrast_popGen <- makeContrasts(
MBvsPMN = pop_genMB - pop_genPMN,
PMvsPMN = pop_genPM - pop_genPMN,
MCvsPMN = pop_genMC - pop_genPMN,
MMvsPMN = pop_genMM - pop_genPMN,
BvsPMN = pop_genB - pop_genPMN,
levels = design_popGen)
# function for LIMMA
limma_getRes <- function(df, design, contrast) {
fit1 <- lmFit(df, design)
fit2 <- contrasts.fit(fit1, contrasts = contrast)
fit3 <- eBayes(fit2, trend = T, robust = T)
summary <- decideTests(fit3, p.value= 0.05, lfc=1, method = "separate", adjust.method = "BH")
results <- list(
fit1 = fit1,
fit2 = fit2,
fit3 = fit3,
summary = summary)
return(results)
}
# execute
limma_getRes fct DE_popGen <- limma_getRes(exp_4DE, design_popGen, contrast_popGen)
#export diffEx tables from limmas fit3 objects
extract_popGen <- function(fit3){
MBvsPMN <- topTable(fit3, number = nrow(fit3), coef = "MBvsPMN") #vsPMN 1
PMvsPMN <- topTable(fit3, number = nrow(fit3), coef = "PMvsPMN") #vsPMN 2
MCvsPMN <- topTable(fit3, number = nrow(fit3), coef = "MCvsPMN") #vsPMN 3
MMvsPMN <- topTable(fit3, number = nrow(fit3), coef = "MMvsPMN") #vsPMN 4
BvsPMN <- topTable(fit3, number = nrow(fit3), coef = "BvsPMN") #vsPMN 5
result <- list(
MBvsPMN = MBvsPMN, #1
PMvsPMN = PMvsPMN, #2
MCvsPMN = MCvsPMN, #3
MMvsPMN = MMvsPMN, #4
BvsPMN = BvsPMN #5
)
return(result)
}
## extract results
extract results DE_popGen <- extract_popGen(DE_popGen$fit3)
#push rownames to fid col
DE_popGen<- map(DE_popGen, rownames_to_column, var = "fid")
# as df
DE_popGen_df <- bind_rows(DE_popGen, .id = "comparison")
Thanks a lot for your suggestion! I get quite a lot of genes with ad.p < 0.05 (6773 out of 7543). In the single contrasts I have around 1-2k genes per group, but that is controlled via logfoldChange as well. I guess that explains it :) Would there be a way to control for a "meaningful" change by the F statistics? I found examples of how to calculate the empirical vs critical for a small number of comparisons (well, 1...) but it doesn't seem to be easily applicable to a list of thousands of genes. Any suggestions are welcome but it's a great start for sure. Thanks a lot!