Hi All,
I wonder if anyone is familiar with the Bioconductor RankProduct package for ranking and obtaining differentiallly expressed genes. Some info about the software are as follows:
http://bioinformatics.oxfordjournals.org/content/22/22/2825.full
http://www.bioconductor.org/packages/release/bioc/manuals/RankProd/man/RankProd.pdf
http://www.bioconductor.org/packages/2.11/bioc/vignettes/RankProd/inst/doc/RankProd.pdf
I ran into some problems while using the program, maybe because of my little knowledge of R language. I tried to replicate the steps in the pdf files above with my own data. Although my own datasets were not in the afffy .cel files as in the examples, but only as rows and columns in a tab-delimited file. I have two conditions (1 and 2, replicate = 4 for each)
My codes are as follows:
library(RankProd)
library(preprocessCore)
#Read expression data
#gdata <- read.table(file="data2.txt", sep="\t", header=T) #9000 rows of genes X 8 columns of chips
gdata <- read.table(file="data2.txt", sep="\t", header=T, row.names=1) #9000 rows of genes X 8 columns of chips
#colnames(gdata)
# This vector contains the microarray sample names
SampleNames= names(data.frame(gdata[,-1]))
#names(datExpr)=gdata[,1]
# This vector contains the gene names
datExpr.gnames= gdata$GeneName
# Since the first column contains the gene names, exclude it.
# dataExp is then the matix required
datExpr=data.frame(gdata[,-1])
#convert data into matrix form
datExpr <- as.matrix(datExpr)
#data normalization - quantile normalization
#datExpr.log.norm <- normalize.quantiles((log2(datExpr)),copy=TRUE) #with logged data
datExpr <- datExpr.log.norm
#datExpr.norm <- normalize.quantiles(datExpr,copy=TRUE) #without logged data
#datExpr <- datExpr.norm
# Identify two class data - control/treatment (or condition 1/condition2)
nl <- 4
n2 <- 4
cl <- rep(c(0,1), c(nl, n2))
datExpr.cl <- cl
# data were generated under identical or very similar conditions except the
# factor of interest (e.g., control and treatment),
origin <- rep(1, nl + n2)
datExpr.origin <- origin
# Data anslysis
datExpr.sub <- datExpr[,which(datExpr.origin == 1)]
datExpr.cl.sub <- datExpr.cl[which(datExpr.origin == 1)]
datExpr.origin.sub <- datExpr.origin[which(datExpr.origin == 1)]
#Rank product analysis and output
#RP.out <- RP(datExpr.sub, datExpr.cl.sub, num.perm = 100, logged = TRUE,na.rm = FALSE, plot = FALSE, rand = 123)
RP.out <- RPadvance(datExpr.sub, datExpr.cl.sub, datExpr.origin.sub, num.perm = 100,logged = TRUE,
na.rm = FALSE, gene.names = datExpr.gnames, plot = FALSE,rand = 123)
# Output a table of the identified genes based on user-specified selection criteria
topGene(RP.out, cutoff = 0.05, method = "pfp", logged = TRUE,logbase = 2, gene.names = datExpr.gnames)
I did run the code, but my fold changes for differentially expressed genes in one condition VS the other were either 0's or infinities. I wonder if anyone with experience with this program can help me.
Thank you
James
Cross-posted from SO: http://stackoverflow.com/questions/14868462/gene-ranking-microarray. We don't mind but we like to know :)