Help with differential expression microarray data using oligo: adjusted p values are very high
1
0
Entering edit mode
4.4 years ago

Hello Biostars I have been using the following script for differential expression of affymetrix microarray data for several datasets, however I keep getting very high adjusted p values so I believe I am missing out a step. Apologies in advance for the very long post but I think it's important to show all my code.

 library(GEOquery)
library(limma)
library(oligo)
library(dplyr)
library(matrixStats)
library(biomaRt)

########## file download and normalisation####################################
setwd("~/Documents/Gina")

getGEOSuppFiles("GSE112841")

list.files("GSE112841")

untar("GSE112841/GSE112841_RAW.tar", exdir = "GSE112841/CEL")

celfiles <- list.files("GSE112841/CEL", full = TRUE)

rawData <- read.celfiles(celfiles)

normData <- rma(rawData)

###### filtering for poorly expressed probes##################################

data_medians <- rowMedians(exprs(normData))

man_threshold <- 4

no_samples <- table(man_threshold)

samples_cutoff <- min(no_samples)

idx_man_threshold <- apply(Biobase::exprs(normData), 1,
                          function(x){
                               sum(x > man_threshold) >= samples_cutoff})
table(idx_man_threshold)

filtered_normData <- subset(normData, idx_man_threshold)

dim(filtered_normData) # 24154 genes

dim(normData) # 53617 genes

###### gene annotation#######################################################

mart <- useMart("ENSEMBL_MART_ENSEMBL")

mart <- useDataset("hsapiens_gene_ensembl", mart)

Glist <- getBM(attributes = c("affy_hugene_2_0_st_v1",
                              "external_gene_name", 
                              "gene_biotype"), 
               filters = "affy_hugene_2_0_st_v1", 
               values =rownames(exprs(filtered_normData)),
               mart = mart)

Biomart <- Glist[which(Glist$gene_biotype=="protein_coding"),]

DF <- as.data.frame(exprs(filtered_normData))

DF$Probe_Name <- rownames(DF)

names(DF) <- c("shNDRG1_1", "shNDRG1_2", "shNDRG1_3", 
               "Control_1", "Control_2", "Control_3", 
               "Probe_Name")

Biomart <- Biomart[! duplicated(Biomart$external_gene_name),]

Biomart <- Biomart[! duplicated(Biomart$affy_hugene_2_0_st_v1),]

DF <- DF[! duplicated(DF$Probe_Name),]

X <- merge(DF, Biomart, by.x = "Probe_Name", by.y = "affy_hugene_2_0_st_v1")

rownames(X) <- X$external_gene_name

X$Probe_Name <- X$Probe_Name  <- X$gene_biotype <- X$external_gene_name <- NULL

########### differential expression##########################################
Z <- rep(c("shNDRG1", "Control"), each = 3)

f = factor(Z,levels=c("shNDRG1","Control"))

design = model.matrix(~ 0 + f)

colnames(design) = c("shNDRG1","Control")

data.fit = lmFit(X, design)

contrast.matrix = makeContrasts(shNDRG1-Control,levels=design)

data.fit.con = contrasts.fit(data.fit,contrast.matrix)

data.fit.eb = eBayes(data.fit.con)

tab = topTable(data.fit.eb, number=Inf,adjust.method="fdr")

    head(tab)


             logFC     AveExpr         t      P.Value adj.P.Val         B
VLDLR       1.1195324  7.216597  8.877460 4.023989e-05 0.4014946 -2.950292
CCL28       1.4088015  4.193678  8.015200 7.887233e-05 0.4014946 -2.995123
SERHL2     -1.0807675  7.651956 -7.503533 1.210243e-04 0.4014946 -3.027914
CASP14     -1.2332972  4.224030 -7.443782 1.274233e-04 0.4014946 -3.032104
EPHB6      -1.1213133  5.036059 -7.232237 1.533330e-04 0.4014946 -3.047611
SRARP      -1.3676657  5.549409 -7.226247 1.541483e-04 0.4014946 -3.048066
NDRG1      -1.1750064  7.820946 -7.224252 1.544210e-04 0.4014946 -3.048217

Any help with sorting out the adjusted p values will be most appreciated. Thanks.

R microarray oligo limma • 1.5k views
ADD COMMENT
3
Entering edit mode
4.4 years ago

Not much to say, really. If nothing passes FDR correction, then we cannot solve that. I do not see anything unusual with your code. Just double check various things such as the fact that Z should match perfectly with the columns of your input expression data.

If you want, please paste the first few entries from your results table.

ADD COMMENT
0
Entering edit mode

I have added in the results table.

ADD REPLY
1
Entering edit mode

Thanks for adding. It looks like a genuine result - the absolute fold-changes are not even that high. Were you expecting something to be statistically significantly different?

If this is the genuine result based on this small 3 vs 3 comparison, then you'd require a larger cohort in order to find a statistically significantly different effect in this disease / condition of study.

ADD REPLY
0
Entering edit mode

Thank you for your response. I was hoping there would be some DE genes and that the gene knockdown of NDRG1 was successful. But it looks like the difference between the control and knockdown is not significant enough. I was just curious if it was my code or the data, and it seems like the latter was the issue. I will find a larger dataset .

ADD REPLY
1
Entering edit mode

Check that there are not other probes targeting this gene, which could hamper the statistical inferences made for it. For example, when you normalise, you can try other options with rma():

ADD REPLY
0
Entering edit mode

I have checked and I found only one probe for NDRG1

affy_hugene_2_0_st_v1 ensembl_gene_id external_gene_name
1              17081401 ENSG00000104419              NDRG1
                                                      description   gene_biotype
1 N-myc downstream regulated 1 [Source:HGNC Symbol;Acc:HGNC:7679] protein_coding
ADD REPLY
1
Entering edit mode

I see. Your options are limited, it seems. Depending on what you want to do with this data, you could report nominal p-values, but this would be extremely difficult to publish.

ADD REPLY

Login before adding your answer.

Traffic: 1709 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6