Question

Dichotomizing Gene Expression Data Based On The Median

0

Entering edit mode

12.4 years ago

moranr ▴ 290

Hi,
I have an expression matrix : gse14814.gcrma, i have normalised and filtered this.

current_study <- gse14814.gcrma[gse14814_probes,] These are the genes that i wish to use, along with there expression values.

I have created an empty matrix di_matrix with the dimensions of the above expression matrix(90 rows, by 1567 genes/cols), i fill this in with dichotomized values based on the expression. So for each gene I want to calculate the median expression for that column/gene. Then for each array for each gene if the gene is above the median assign a 1 to the same position in di_matrix, if lower assign 0 to the same location on the di_matrix.

So I think I should create a for loop:

rownames(di_matrix) <- sampleNames(current_study)
colnames(di_matrix) <- featureNames(current_study)

 for (i in 1:1567) {             
                medianVal <-  median(exprs(current_study[,i]))
                current_logical <- exprs(current_study[,i])  > medianVAL
                current_di_gene <- as.numeric(current_logical)
                di_matrix[,i] <- current_di_gene
                    }

This is wrong , its giving me back

Error in gse14814dimatrix[, i] <- currentdigene : number of items to replace is not a multiple of replacement length

Im sorry, I dont have a lot of experience in R, im very much a beginner.

Thanks for the help, R

bioconductor microarray r • 4.2k views

ADD COMMENT • link updated 10.7 years ago by Biostar 20 • written 12.4 years ago by moranr ▴ 290

1

Entering edit mode

Try to use apply functions instead of loops. http://www.ats.ucla.edu/stat/r/library/advanced_function_r.htm

ADD REPLY • link 12.4 years ago by zx8754 12k

1

Entering edit mode

Just a comment here since I think the answers are going to get you where you need to go. If you find yourself using a "for" loop over rows or columns, you should look for an "apply" that fits your needs instead. Using an "apply" can sometimes be orders-of-magnitude faster for the same result.

ADD REPLY • link 12.4 years ago by Sean Davis 27k

0

Entering edit mode

Ill test this in the morning, thanks for the advice, really appreciate it guys

ADD REPLY • link 12.4 years ago by moranr ▴ 290

score 5 · Answer 1 · 2013-02-26

5

Entering edit mode

12.4 years ago

zx8754 12k

Try this:

#create dummy data
r <- 90
c <- 1567
di_matrix <- matrix(round(runif(r * c, 1, 100)), ncol = c)

#get median per gene
genes_median <- apply(di_matrix, 2, median)

#convert to 0 and 1
di_matrix01 <- ifelse(di_matrix > genes_median, 1, 0)

ADD COMMENT • link 12.4 years ago by zx8754 12k

score 2 · Answer 2 · 2013-02-26

2

Entering edit mode

12.4 years ago

fo3c ▴ 450

If to get the median you need to call exprs, don't you also need it in the comparison that fails? current_logical <- exprs(current_study[,i]) > medianVAL

ADD COMMENT • link 12.4 years ago by fo3c ▴ 450

0

Entering edit mode

Yes thank you, I forgot to put to this in, this still doesnt work though. Edited appropriately.

ADD REPLY • link 12.4 years ago by moranr ▴ 290