microarray data processing
1
0
Entering edit mode
9.7 years ago
ewre ▴ 250

Hi, all

I have a question on microarray data processing. Here is what I have done:

we use ilumina humanHT12 microarray to profile gene expression changes on ~100 samples. after normalized with lumi r package(background adjusted, variance stablized and normalized with "ssn"), I randomly selected two sets of genes(about 100 genes for each set) from the data matrix(15301 genes ;141 samples),take the median value for each set of genes across all samples and then plot the value of one set against the other. to my surprise, I have found a correlation between the two randomly selected gene sets. Anyone could explain about this?

#dat is the expression matrix

##generate random index
set1.index=sample(1:nrow(dat),100)

set2.index=sample(1:nrow(dat),100)

set1.dat=dat[set1.index, ]

set2.dat=dat[set2.index, ]

##take the median value across samples
aggregate(set1.dat, by=list(set=rep(1,nrow(set1.dat))),FUN=median)->set1.aggr

aggregate(set2.dat, by=list(set=rep(1,nrow(set2.dat))),FUN=median)->set2.aggr

##reform the data for plot

rbind(set1.aggr[,-1],set2.aggr[,-1])->medi.dat

##plot it

plot(medi.dat[1,],medi.dat[2,])

with many thanks

microarray data processing • 2.3k views
ADD COMMENT
0
Entering edit mode
9.7 years ago

You're comparing sample medians versus themselves, of course they show correlation (otherwise, statistics would break). Presumably you meant to get the median of genes and compare them:

set1.agr <- apply(set1.dat, 1, median)
set2.agr <- apply(set2.dat, 1, median)
plot(set1.agr, set2.agr)
ADD COMMENT
0
Entering edit mode

thanks for your reply, Devon Ryan. I think you mean that those 100 genes randomly selected can represent the whole ~20000genes. that is reasonable. but it is not always the case. I have try the code in this post for other independent data sets, there are cases that it show no correlation at all.

Actually this is a question raised by a interesting hypothesis in my research: we observed that the oxidative phsorylation function was disturbed in our case samples, so we hypothesized that oxidative phosphorylation genes' expression profile must be different from the 'overall expression profile'(we use randomly selected gene to represent this overall expression profile, this is in accordance with your reply~_~), but to my surprise we find that there is always a high correlation between OXPHS genes expression and randomly selected genes' expression profile in my data. So I check this hypothesis in other independent data sets, it turns out that in some data sets the phenomenon holds while in others it didn't.

ADD REPLY

Login before adding your answer.

Traffic: 1079 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6