How To Merge Two Microarrays Datasets?
3
2
Entering edit mode
11.5 years ago
fbrundu ▴ 350

Hi all, I am trying to merge two microarrays datasets, as in this paper. I did not understand how to do it because the datasets do not share the same samples' names set.. I did not find any field that can relate one dataset to each other.. any hint on how to do it?

The two datasets are this and this.

Thanks

microarray merge dataset • 9.7k views
ADD COMMENT
0
Entering edit mode

added answer*

ADD REPLY
0
Entering edit mode

haw can we integrate GSE with different GPL?

ADD REPLY
0
Entering edit mode

I am also interested in integrating GSE with different GPL (GPL96 vs. GPL3921), did you find any solution for your problem?

ADD REPLY
3
Entering edit mode
10.6 years ago
alaincoletta ▴ 170

InSilico DB has a "merging" R-Bioconductor package to combine public datasets from GEO. If you are not using R you can also combine data from the online platform (See this short step-by-step tutorial)

Example:

# Retrieve 2 datasets
eset1 = getDataset(gse="GSE10072", gpl="GPL96", norm="ORIGINAL", genes=TRUE);
eset2 = getDataset(gse="GSE7670", gpl="GPL96", norm="ORIGINAL", genes=TRUE);

#combine them
esets = list(eset1, eset2);
eset = merge(esets, method="NONE");

#plot them
plotMDS(eset, targetAnnot="Disease", batchAnnot="Study");

InSilico DB packaged various batch removal effects methods so line 4 could be replaced with:

eset = merge(esets, method="XPN");

# or

eset = merge(esets, method="COMBAT");

Hope this helps.

For more info Bioinformatics paper reference; InSilico DB and InSIlico Merging packages links, and blog link.

-Tutorial example : https://insilicodb.org/the-impact-of-batch-effects-when-merging-different-data-sets/

R-Bioconductor packages:

ADD COMMENT
1
Entering edit mode
11.5 years ago

You can certainly download all the .CEL files and normalize them together. However, you may find that your hypothesis testing could be challenging since there will likely be a batch effect between the two datasets.

ADD COMMENT
1
Entering edit mode
10.6 years ago

In theory if you have two sets of raw expression sets, from the same array model, then you can simply bind one to the other (accounting for the probe location). However, doing this creates a whole world of problems. There would have to be a very good justification for doing this. The first problem is that there will be a batch effect between the two datasets, as previously mentioned. If you manage to correct for that successfully then, you might get some meaningful data out of the analysis, might. This is a post experimental design decision and is not recommended.

ADD COMMENT

Login before adding your answer.

Traffic: 2277 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6