Entering edit mode
4.3 years ago
salamandra
▴
550
I want to normalize agilent data (one color array).
Followed limma package user guide, but in the end got some genes repeated for different probes. How can I get data summarized by gene instead of probe? Is it ok, to just average probes of same gene?
Code so far:
library(limma)
library(tidyverse)
library(RnAgilentDesign028282.db)
# read SDRF file - equivalent to "targets" file in limma:
SDRF <- read.delim(SDRF.path,check.names=FALSE,stringsAsFactors=FALSE)
# read the gene expression intensity data:
x <- read.maimages(paste(dataDir, studyId, Data.Name, SDRF[,"Array Data File"], sep='/'), source="agilent", green.only=TRUE, other.columns="gIsWellAboveBG")
# gene annotation:
x$genes$EntrezID <- mapIds(RnAgilentDesign028282.db, x$genes$ProbeName, keytype="PROBEID", column="ENTREZID")
x$genes$Symbol <- mapIds(RnAgilentDesign028282.db, x$genes$ProbeName, keytype="PROBEID", column="SYMBOL")
# normexp background correction and quantile normalization:
y <- backgroundCorrect(x, method="normexp")
y <- normalizeBetweenArrays(y, method="quantile")
## filter probes:
Control <- y$genes$ControlType==1
NoEntrez <- is.na(y$genes$EntrezID)
IsExpr <- rowSums(y$other$gIsWellAboveBG > 0) >= 4
yfilt <- y[!Control & !NoEntrez & IsExpr, ]
# remove annotation columns not needed:
yfilt$genes <- yfilt$genes[,c("ProbeName","Symbol","EntrezID")]
# convert info to dataframe:
normData = as.data.frame.EList(yfilt)