miRNA differential expression analysis, no significant adjusted p-values
2
0
Entering edit mode
8.3 years ago

I'm attempting to carry out a differential expression analysis upon a GEO dataset, specifically focusing on the epithelial-mesenchymal transition (EMT). However I've gotten no signifcant results and I don't know where I'm going wrong.

I'm using empirical Bayes in limma to identify differentially expressed micro RNA. The experiment is done using the NCI-60 panel, and I've built the design matrix by categorising each cell line as either "epithelial" or "mesenchymal", based off Figure 1 of this paper (which categorises each cell line on the basis of the ratio of E-cadherin to Vimentin expression).

These are the top results I get after analysis. The lowest adjusted p-value is 0.20. The analysis that I'm trying to replicate got ~100 differentially expressed miRNAs to my 0.

         ID     miRNA_ID    logFC      P.Value adj.P.Val
10946 10946  hsa-miR-141 4.285924 0.0001753329 0.2018081
17427 17427 hsa-miR-200c 4.277219 0.0004007614 0.2306382
27777 27777 hsa-miR-200a 3.738894 0.0016939681 0.5190115
9578   9578 hsa-miR-200b 3.992947 0.0018036889 0.5190115
18900 18900 hsa-miR-200b 3.826707 0.0026484041 0.6096626
11000 11000 hsa-miR-200a 3.366898 0.0035893928 0.6885652

Additionally, the distribution of uncorrected p-values clusters around 1. Which definitely indicates that something is up.

The relevant bits of code I'm using (cell_type_nci60.csv can be found here):

library(limma)
library(GEOquery)

#####################################################################
## Step 2: GEO Preprocessing (miRNA)
#####################################################################

G <- getGEO("GSE26375",GSEMatrix=T) 
eset.geo <- G[[1]] # Get the ExpressionSet object

# Normalization: may not be necessary as GEO datasets should be pre-processed. 
#library(affyPLM) 
#eset.geo <- normalize.ExpressionSet.quantiles(eset.geo)

#####################################################################
## Step 3: Unify the sample names so can build the design matrix
#####################################################################

cleanNames <- function(names) {
  names <- gsub("-", "", names)
  names <- gsub(" ", "", names)
  names <- gsub("/", "", names)
  names <- gsub("\\(TB\\)", "", names) # Bit hacky but oh well
}

# Want to see each cell line, as need to split by epithelial/mesenchymal
# Phenotype data taken from Sun-Mi Park et al (2008)
pheno.cat <- read.csv("Data/cell_type_nci60.csv")
pheno.cat$title <- cleanNamespheno.cat$title)

mirna.names <- cleanNames(sapply(pData(phenoData(eset.geo.n))[,1:2]$title, as.character))
pheno.mirna <- sapplypheno.cat[match(geo.names,pheno.cat$title),3], as.character)

#####################################################################
## Step 4: Set up bits and pieces for limma
#####################################################################

# Design matrix 
design.mirna <- cbind(mi.e=pheno.mirna=="epithelial",
                      mi.m=pheno.mirna=="mesenchymal")*1 # to make it numeric
# Make contrast matrix
c.mirna <- makeContrasts(mi.e-mi.m, levels=design.mirna)

#####################################################################
## Step 5: Perform differential expression analysis
#####################################################################

### miRNA Analysis:

# Fit the linear model
fit.mirna <- lmFit(eset.geo, design.mirna) 
fit2.mirna <- contrasts.fit(fit.mirna, c.mirna) 
fit2.mirna <- eBayes(fit2.mirna) 

# Get the first few results
head(topTable(fit2.mirna, coef=1))

# Output everything
n.mirna <- length(featureNames(eset.mirna))
top.mirna <- topTable(fit2.mirna, coef=1, n=n.mirna, adjust="fdr")

# Generate output
write.table(top.mirna, file="mirna_de.out", quote=F, sep="\t", row.names=F)
R miRNA Differential Expression limma • 2.7k views
ADD COMMENT
1
Entering edit mode

Make sure you're associating the appropriate samples to the appropriate group in your design matrix. You can also just compare what you get with GEO2R directly in GEO.

ADD REPLY
2
Entering edit mode
8.3 years ago

The issue lay in the fact that the undefined samples acted as noise. Removing all non-epithelial and non-mesenchymal samples gave me better looking results!

ADD COMMENT
0
Entering edit mode
4.9 years ago

i have downloaded LUAD miRNA data from GEO which has miRNA_id and read counts ,how can i get p values?as shown in above fig.

ADD COMMENT

Login before adding your answer.

Traffic: 1479 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6