I'm attempting to carry out a differential expression analysis upon a GEO dataset, specifically focusing on the epithelial-mesenchymal transition (EMT). However I've gotten no signifcant results and I don't know where I'm going wrong.
I'm using empirical Bayes in limma to identify differentially expressed micro RNA. The experiment is done using the NCI-60 panel, and I've built the design matrix by categorising each cell line as either "epithelial" or "mesenchymal", based off Figure 1 of this paper (which categorises each cell line on the basis of the ratio of E-cadherin to Vimentin expression).
These are the top results I get after analysis. The lowest adjusted p-value is 0.20. The analysis that I'm trying to replicate got ~100 differentially expressed miRNAs to my 0.
ID miRNA_ID logFC P.Value adj.P.Val
10946 10946 hsa-miR-141 4.285924 0.0001753329 0.2018081
17427 17427 hsa-miR-200c 4.277219 0.0004007614 0.2306382
27777 27777 hsa-miR-200a 3.738894 0.0016939681 0.5190115
9578 9578 hsa-miR-200b 3.992947 0.0018036889 0.5190115
18900 18900 hsa-miR-200b 3.826707 0.0026484041 0.6096626
11000 11000 hsa-miR-200a 3.366898 0.0035893928 0.6885652
Additionally, the distribution of uncorrected p-values clusters around 1. Which definitely indicates that something is up.
The relevant bits of code I'm using (cell_type_nci60.csv can be found here):
library(limma)
library(GEOquery)
#####################################################################
## Step 2: GEO Preprocessing (miRNA)
#####################################################################
G <- getGEO("GSE26375",GSEMatrix=T)
eset.geo <- G[[1]] # Get the ExpressionSet object
# Normalization: may not be necessary as GEO datasets should be pre-processed.
#library(affyPLM)
#eset.geo <- normalize.ExpressionSet.quantiles(eset.geo)
#####################################################################
## Step 3: Unify the sample names so can build the design matrix
#####################################################################
cleanNames <- function(names) {
names <- gsub("-", "", names)
names <- gsub(" ", "", names)
names <- gsub("/", "", names)
names <- gsub("\\(TB\\)", "", names) # Bit hacky but oh well
}
# Want to see each cell line, as need to split by epithelial/mesenchymal
# Phenotype data taken from Sun-Mi Park et al (2008)
pheno.cat <- read.csv("Data/cell_type_nci60.csv")
pheno.cat$title <- cleanNamespheno.cat$title)
mirna.names <- cleanNames(sapply(pData(phenoData(eset.geo.n))[,1:2]$title, as.character))
pheno.mirna <- sapplypheno.cat[match(geo.names,pheno.cat$title),3], as.character)
#####################################################################
## Step 4: Set up bits and pieces for limma
#####################################################################
# Design matrix
design.mirna <- cbind(mi.e=pheno.mirna=="epithelial",
mi.m=pheno.mirna=="mesenchymal")*1 # to make it numeric
# Make contrast matrix
c.mirna <- makeContrasts(mi.e-mi.m, levels=design.mirna)
#####################################################################
## Step 5: Perform differential expression analysis
#####################################################################
### miRNA Analysis:
# Fit the linear model
fit.mirna <- lmFit(eset.geo, design.mirna)
fit2.mirna <- contrasts.fit(fit.mirna, c.mirna)
fit2.mirna <- eBayes(fit2.mirna)
# Get the first few results
head(topTable(fit2.mirna, coef=1))
# Output everything
n.mirna <- length(featureNames(eset.mirna))
top.mirna <- topTable(fit2.mirna, coef=1, n=n.mirna, adjust="fdr")
# Generate output
write.table(top.mirna, file="mirna_de.out", quote=F, sep="\t", row.names=F)
Make sure you're associating the appropriate samples to the appropriate group in your design matrix. You can also just compare what you get with GEO2R directly in GEO.