Question

Combining Two Platforms Affy Hgu133A And Hgu133B

2

Entering edit mode

12.4 years ago

mohan173bmc ▴ 20

Hello,

I find myself in a situation where I need to reanalyze an old dataset available at GEO. However, the issue at hand is that the experimental design involves using the same sample for two platforms HGU133a and HGU133b. Is there a way to combine two platforms like this which has used same sample?

I performed preprocessing procedures and mas5 normalization separately for both and extracted the files. I see that 168 probeset ids are common between the two. It is reported in one paper encountering a similar problem that the values for HGU133b was scaled to HGU133a based on the common 100 genes.

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0010696

I am not aware how it is done and whether this a valid approach? Is there any other way of solving the problem? It would also be nice if i get to know the protocol.

many thanks and best regards, mohan

microarray • 7.2k views

ADD COMMENT • link updated 10.9 years ago by oganm ▴ 60 • written 12.4 years ago by mohan173bmc ▴ 20

0

Entering edit mode

many thanks for your reply sean, istvan and fanofactor.

I have no idea how the scaling from hgu133b to hgu133a was performed. So i cannot follow it. Methods suggested if any are welcome. I hope these genes do not change across experiments (btw. the dataset is a cancer tissue sample in a clinical cohort)
Although combining two files by rows is a good idea to get differentially regulated genes, in order to perform a coexpression assay it is not suitable i guess. I wish that the final combined file is also fit for a metaanalysis across two different clinical cohorts.
I will surely check if "combineaffy" can be used.

thank you all again.

best,
mohan

ADD REPLY • link updated 5.3 years ago by Ram 45k • written 12.4 years ago by mohan173bmc ▴ 20

1

Entering edit mode

With regard to #3, combineAffyBatch is for combining arrays that share content. Hgu133a and hgu133b do not share content, so this will be equivalent to combining rows.

ADD REPLY • link 12.4 years ago by Sean Davis 27k

0

Entering edit mode

With regard to #2, note that correlation is scale-free. Try this experiment:

> dat = seq(0,10,1) + rnorm(99)
> dat2 = seq(0,10,1) + rnorm(99)
> cor(dat,dat2)
[1] 0.8928766
> cor(dat,dat2*20)
[1] 0.8928766

That being the case, scaling the rows will not change a "co-expression" analysis that relies on correlation (and most do).

ADD REPLY • link updated 5.3 years ago by Ram 45k • written 12.4 years ago by Sean Davis 27k

score 5 · Answer 1 · 2012-11-09

5

Entering edit mode

12.4 years ago

Sean Davis 27k

Normalize the data from the two platforms separately and then combine by just combining the rows from one platform to the other. Since testing for differential expression is done per-gene, there is not a need to have 133a and 133b "combined" in any formal way.

ADD COMMENT • link 12.4 years ago by Sean Davis 27k

Ram · Answer 2 · 2014-05-22

Well this is old but still we were able to solve this issue and I'd like to share it so no one else will suffer.

I assume the two chips you use share some probesets and you only want to deal with those shared ones as you can't really compare non existing probes.

You need to tinker with the code of rma function. You can see the code by ctrl + click on the rma function written in your script but I'll try to clear enough here so you won't need to do that.

Also edit: do not load Gdata package beforehand

The critical parts are 1) :

exprs <- .Call("rma_c_complete_copy", pm(object, subset),
          pNList, ngenes, normalize, background, bgversion,
          verbose, PACKAGE = "affy")

that outputs the resulting expressions and 2)

new("ExpressionSet", phenoData = phenoData(object), annotation = annotation(object), 
    protocolData = protocolData(object), experimentData = experimentData(object), 
    exprs = exprs

that creates the resulting object of the function.

In part 1 inputs normalize, background, bgversion and verbose normally comes from the inputs of the original rma function. Fill them as you would normally. To set them to default just use

verbose = TRUE
destructive = TRUE
normalize = TRUE
bgversion = 2

There are 3 things you need to create manualy: pNList, nGenes and pm(object, subset). Read both groups seperately. I just placed them in different directories

setwd('HGU133a')
affyA <- ReadAffy()
setwd('..')
setwd('HGU133b')
affyB <- ReadAffy()
setwd('..')

You want to get the common probes so do

pNListA = probeNames(affyA)
pNListB = probeNames(affyA)
subsetList = pNListA[pNListA %in% pNListB]

Now you have the subsets so you can request pms of both samples and stitch them together

subsetPmA = pm(affyA, unique(subsetList))
subsetPmB = pm(affyB, unique(subsetList))
allPm = cbind(subsetPm, subsetPmOldOrdered) #this will go into the .call function

The two variables left for part 1 is simple just do

ngenes = length(unique(subsetList))
pNList = split(0:(length(subsetList) - 1), subsetList)

and run part 1 to get normalized expression values

exprs <- .Call("rma_c_complete", allPm, 
               pNList, ngenes, normalize, background, bgversion, 
               verbose, PACKAGE = "affy")

To create the new object you need to stitch the components of the two objects together. For annotation, if one of your chips has a subset of the probes in the other probe, just use that one, but I don't think it matters that much, you have what you need at this point. Just don't mix up the order when you are using combine.

phenoD = combine(phenoData(affyA), phenoData(affyB))
annot =  annotation(affyA)
protocolD = combine(protocolData(affyA), protocolData(affyB))
experimentD = experimentData(affyA)

newNormalized = new("ExpressionSet", phenoData = phenoD, annotation = annot, 
    protocolData = protocolD, experimentData = experimentD, 
    exprs = exprs)

That's it. you now have your handcrafted rma output. Use it as you normally would.

score 1 · Answer 3 · 2012-11-09

I suspect that there is no standardized way to do this - but if you have already found a published method I would go with that - and cite it.

In the end it all depends on whether the genes used for normalization do actually represent genes that don't change across the experiments. That's probably the critical element.

score 1 · Answer 4 · 2012-11-09

1

Entering edit mode

12.4 years ago

fanofactor ▴ 30

The package matchprobes for R (I am not sure it is maintained) has a function (combineAffyBatch) to combine different chips. It combines the probes by sequence, not by id.

ADD COMMENT • link 12.4 years ago by fanofactor ▴ 30