Tutorial:Importance of Array Quality Control - arrayQualityMetrics (PART I)
0
3
Entering edit mode
8.4 years ago
majuang66 ▴ 140

Today's topic is the importance of array quality control. Many articles and websites related to microarray analysis have demonstrated that array quality control is privotal process involved in improvement of identification of differential gene expression. I introduce one example associated with array quality through arrayQualityMetrics.

# Open R
# Import ArrayExpress data (E-GEOD-3419) into R
source("http://bioconductor.org/biocLite.R")
biocLite("ArrayExpress")
library(ArrayExpress)
AEset<-ArrayExpress("E-GEOD-3419")

# Normalization of E-GEOD-3419
library(affy)
AEsetnorm<-rma(AEset)

# Array Quality Control through arraQualityMetrics
biocLite("arrayQualityMetrics")
library(arrayQualtiyMetrics)

# arrayQualityMetrics of AEset (RAW data of E-GEOD-3419)
arrayQualityMetrics(expressionset=AEset,outdir="Report_for_E-GEOD-3419",force=TRUE,do.logtransform=TRUE)

# arrayQualityMetrics of AEsetnorm (normalized data of E-GEOD-3419) - rma normalization convert expression value into log-transformed expression value, so we do ont input "do.logtransform".
arrayQualityMetrics(expressionset=AEsetnorm,outdir="Report_for_nE-GEOD-3419",force=TRUE)

Comparison AEset with AEsetnorm

Array metadata and outlier detection overview

  • These tables showed change of outlier arrays before and after normalization.
  • Through these tables and plots, I identified outlier arrays (#12 and #16 array) and analyzed DEGs in all arrays and subsets removed outlier arrays through limma
# DEGs analysis (limma) at AEsetnorm dataset removed outlier arrays
library(Biobase)
library(limma)

fvarLabels(AEsetnorm)<-make.names(fvarLabels(AEsetnorm))
gsms<-"01000101110X110X" # first X is array # 12 and second X is array #16

sml<-c()
for(i in 1:nchar(gsms)) { sml[i]<-substr(gsms,i,i) }
sel<-which(sml !="X")
sml<-sml[sel]
gset<-AEsetnorm[,sel]
sml<-paste("G",sml,sep="")
fl<-as.factor(sml)
gset$description<-fl
design<-model.matrix(~ description + 0,gset)
colnames(design)<-levels(fl)
fit<-lmFit(gset,design)
cont.matrix<-makeContrasts(G1-G0, levels=design)
fit2<-contrasts.fit(fit,cont.matrix)
fit2<-eBayes(fit2, 0.01)
tT<-topTable(fit2, adjust="fdr",sort.by="B",number=250)
class(tT)

'data.frame':   250 obs. of  6 variables:
 $ logFC    : num  1.58 -1.6 -2.16 1.67 -3.12 ...
 $ AveExpr  : num  5.95 7.35 8.25 8.52 8.74 ...
 $ t        : num  12.49 -9.36 -9.1 9.06 -8.87 ...
 $ P.Value  : num  1.87e-09 9.66e-08 1.40e-07 1.48e-07 1.96e-07 ...
 $ adj.P.Val: num  4.18e-05 8.25e-04 8.25e-04 8.25e-04 8.74e-04 ...
 $ B        : num  10.62 7.62 7.31 7.27 7.04 ...

tT<-subset(tT,select=c("AveExpr","adj.P.Val","P.Value","t","B","logFC"))

write.table(tT,file="GSE3419.txt",row.names=T,sep="\t")

# DEGs analysis (limma) at AEsetnorm dataset (all arrays - 16 arrays)
fvarLabels(AEsetnorm)<-make.names(fvarLabels(AEsetnorm))

gsms_1<-"0100010111001100"

sml_1<-c()
for(i in 1:nchar(gsms_1)) { sml_1[i]<-substr(gsms_1,i,i) }

sml_1<-paste("G",sml_1,sep="")

fl<-as.factor(sml_1)
AEsetnorm$description<-fl

design<-model.matrix(~ description + 0,AEsetnorm)
colnames(design)<-levels(fl)

fit1<-lmFit(AEsetnorm,design)

cont.matrix<-makeContrasts(G1-G0,levels=design)

fit2<-contrasts.fit(fit1,cont.matrix)
fit2<-eBayes(fit2,0.01)

tT<-topTable(fit2,adjust="fdr",sort.by="B",number=250)
tT<-subset(tT,select=c("AveExpr","adj.P.Val","P.Value","t","B","logFC"))

write.table(tT,file="GSE3419_2.txt",row.names=T,sep="\t")

Venndiagram of 2-fold change DEGs between AEsetnorm dataset removed outlier arrays and AEsetnorm all arrays Up-regulated Genes Gene 0.5<DEGs<2 Down-regulated Genes

I identified the clear difference before and after outlier removal. In conclusion, I think that outlier removal is pivotal process to find DEGs correctly.

gene R • 3.9k views
ADD COMMENT
0
Entering edit mode

Thank you majuang66,helpful and nice Tutorial

ADD REPLY

Login before adding your answer.

Traffic: 1664 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6