Merged analysis of microarray datasets using R
2
2
Entering edit mode
5 months ago
ruth ▴ 20

I want to merge three Affymetrix datasets (GSE189788, GSE141804, GSE21942) and perform a combined data analysis to identify DEGs. Kindly suggest an R library package to use for the same

Thanks in advance!

R meta-analysis microarray • 485 views
ADD COMMENT
5
Entering edit mode
5 months ago

Something like this could get you started:

library(GEOquery)

GSE189788 <- getGEO("GSE189788")
GSE189788_eset <- GSE189788$GSE189788_series_matrix.txt.gz

GSE141804 <- getGEO("GSE141804")
GSE141804_U133A_eset <- GSE141804$`GSE141804-GPL96_series_matrix.txt.gz`
GSE141804_U133A2_eset <- GSE141804$`GSE141804-GPL571_series_matrix.txt.gz`

GSE21942 <- getGEO("GSE21942")
GSE21942_eset <- GSE21942$GSE21942_series_matrix.txt.gz

GSE189788_eset@annotation
GSE141804_U133A_eset@annotation
GSE141804_U133A2_eset@annotation
GSE21942_eset@annotation

common.probes <- Reduce(intersect, list(rownames(GSE189788_eset),
                                        rownames(GSE141804_U133A_eset),
                                        rownames(GSE141804_U133A2_eset),
                                        rownames(GSE21942_eset)))

GSE189788_eset <- GSE189788_eset[common.probes,]
GSE141804_U133A_eset <- GSE141804_U133A_eset[common.probes,]
GSE141804_U133A2_eset <- GSE141804_U133A2_eset[common.probes,]
GSE21942_eset <- GSE21942_eset[common.probes,]

View(pData(GSE21942_eset))

combo_pd <- data.frame("Dataset" = c(rep("GSE189788", ncol(GSE189788_eset)),
                         rep("GSE141804", ncol(GSE141804_U133A_eset)),
                         rep("GSE141804", ncol(GSE141804_U133A2_eset)),
                         rep("GSE21942", ncol(GSE21942_eset))),
           "Group" = c(rep("MS", ncol(GSE189788_eset)),
                       gsub("PBMC from ", "", GSE141804_U133A_eset$source_name_ch1),
                       gsub("PBMC from ", "", GSE141804_U133A2_eset$source_name_ch1),
                       ifelse(gsub("disease state: ", "", GSE21942_eset$characteristics_ch1.1)=="healthy", "HC", "MS")),
           "Age" = c(GSE189788_eset$`age(years):ch1`,
                     GSE141804_U133A_eset$`age (years):ch1`,
                     GSE141804_U133A2_eset$`age (years):ch1`,
                     rep(NA, ncol(GSE21942_eset))),
           "Sex" = c(GSE189788_eset$`gender:ch1`,
                     GSE141804_U133A_eset$`gender:ch1`,
                     GSE141804_U133A2_eset$`gender:ch1`,
                     rep(NA, ncol(GSE21942_eset))))

combo_pd$Age <- as.numeric(combo_pd$Age)
combo_pd$Sex <- substr(toupper(combo_pd$Sex), 1, 1)

library(Biobase)

combo_eset <- ExpressionSet(cbind(exprs(GSE189788_eset),
                                  exprs(GSE141804_U133A_eset),
                                  exprs(GSE141804_U133A2_eset),
                                  exprs(GSE21942_eset)))
pData(combo_eset) <- combo_pd

library(Rtsne)

tsne.res <- Rtsne(t(exprs(combo_eset)),
                  perplexity = 15,
                  theta = 0,
                  check_duplicates = FALSE)$Y
gg_df <- data.frame("TSNE1" = tsne.res[,1],
                    "TSNE2" = tsne.res[,2],
                    "Dataset" = combo_pd$Dataset,
                    "Group" = combo_pd$Group)
library(patchwork)
library(ggplot2)

g1 <- ggplot(gg_df,
       mapping = aes(x = TSNE1, y = TSNE2, colour = Dataset)) +
  geom_point() +
  theme_bw() + theme(aspect.ratio = 1) +
  scale_colour_manual(values = pals::brewer.set1(3))

g2 <- ggplot(gg_df,
             mapping = aes(x = TSNE1, y = TSNE2, colour = Group)) +
  geom_point() +
  theme_bw() + theme(aspect.ratio = 1) +
  scale_colour_manual(values = pals::brewer.dark2(5))

g1 + g2

You have a fair amount of batch effect which you will need to address before you do any differential expression. Realistically, you should analyse each dataset separately if possible. TSNE RES

Once you have that addressed you can use limma for differential expression and the vignette for that is pretty expansive so you will be able to find what you need there.

ADD COMMENT
1
Entering edit mode
5 months ago

There may be better methods by now available, but the one I know is fRMA. However, mind that this package is only meant to normalize for batch effects. The samples you pool should still be generated according to the same protocol and use the same array type and ideally reader.

ADD COMMENT

Login before adding your answer.

Traffic: 1934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6