Question

Merged analysis of microarray datasets using R

2

Entering edit mode

5 months ago

ruth ▴ 20

I want to merge three Affymetrix datasets (GSE189788, GSE141804, GSE21942) and perform a combined data analysis to identify DEGs. Kindly suggest an R library package to use for the same

Thanks in advance!

R meta-analysis microarray • 483 views

ADD COMMENT • link updated 5 months ago by yura.grabovska ▴ 690 • written 5 months ago by ruth ▴ 20

score 5 · Answer 1 · 2024-07-04

Something like this could get you started:

library(GEOquery)

GSE189788 <- getGEO("GSE189788")
GSE189788_eset <- GSE189788$GSE189788_series_matrix.txt.gz

GSE141804 <- getGEO("GSE141804")
GSE141804_U133A_eset <- GSE141804$`GSE141804-GPL96_series_matrix.txt.gz`
GSE141804_U133A2_eset <- GSE141804$`GSE141804-GPL571_series_matrix.txt.gz`

GSE21942 <- getGEO("GSE21942")
GSE21942_eset <- GSE21942$GSE21942_series_matrix.txt.gz

GSE189788_eset@annotation
GSE141804_U133A_eset@annotation
GSE141804_U133A2_eset@annotation
GSE21942_eset@annotation

common.probes <- Reduce(intersect, list(rownames(GSE189788_eset),
                                        rownames(GSE141804_U133A_eset),
                                        rownames(GSE141804_U133A2_eset),
                                        rownames(GSE21942_eset)))

GSE189788_eset <- GSE189788_eset[common.probes,]
GSE141804_U133A_eset <- GSE141804_U133A_eset[common.probes,]
GSE141804_U133A2_eset <- GSE141804_U133A2_eset[common.probes,]
GSE21942_eset <- GSE21942_eset[common.probes,]

View(pData(GSE21942_eset))

combo_pd <- data.frame("Dataset" = c(rep("GSE189788", ncol(GSE189788_eset)),
                         rep("GSE141804", ncol(GSE141804_U133A_eset)),
                         rep("GSE141804", ncol(GSE141804_U133A2_eset)),
                         rep("GSE21942", ncol(GSE21942_eset))),
           "Group" = c(rep("MS", ncol(GSE189788_eset)),
                       gsub("PBMC from ", "", GSE141804_U133A_eset$source_name_ch1),
                       gsub("PBMC from ", "", GSE141804_U133A2_eset$source_name_ch1),
                       ifelse(gsub("disease state: ", "", GSE21942_eset$characteristics_ch1.1)=="healthy", "HC", "MS")),
           "Age" = c(GSE189788_eset$`age(years):ch1`,
                     GSE141804_U133A_eset$`age (years):ch1`,
                     GSE141804_U133A2_eset$`age (years):ch1`,
                     rep(NA, ncol(GSE21942_eset))),
           "Sex" = c(GSE189788_eset$`gender:ch1`,
                     GSE141804_U133A_eset$`gender:ch1`,
                     GSE141804_U133A2_eset$`gender:ch1`,
                     rep(NA, ncol(GSE21942_eset))))

combo_pd$Age <- as.numeric(combo_pd$Age)
combo_pd$Sex <- substr(toupper(combo_pd$Sex), 1, 1)

library(Biobase)

combo_eset <- ExpressionSet(cbind(exprs(GSE189788_eset),
                                  exprs(GSE141804_U133A_eset),
                                  exprs(GSE141804_U133A2_eset),
                                  exprs(GSE21942_eset)))
pData(combo_eset) <- combo_pd

library(Rtsne)

tsne.res <- Rtsne(t(exprs(combo_eset)),
                  perplexity = 15,
                  theta = 0,
                  check_duplicates = FALSE)$Y
gg_df <- data.frame("TSNE1" = tsne.res[,1],
                    "TSNE2" = tsne.res[,2],
                    "Dataset" = combo_pd$Dataset,
                    "Group" = combo_pd$Group)
library(patchwork)
library(ggplot2)

g1 <- ggplot(gg_df,
       mapping = aes(x = TSNE1, y = TSNE2, colour = Dataset)) +
  geom_point() +
  theme_bw() + theme(aspect.ratio = 1) +
  scale_colour_manual(values = pals::brewer.set1(3))

g2 <- ggplot(gg_df,
             mapping = aes(x = TSNE1, y = TSNE2, colour = Group)) +
  geom_point() +
  theme_bw() + theme(aspect.ratio = 1) +
  scale_colour_manual(values = pals::brewer.dark2(5))

g1 + g2

You have a fair amount of batch effect which you will need to address before you do any differential expression. Realistically, you should analyse each dataset separately if possible. TSNE RES

Once you have that addressed you can use limma for differential expression and the vignette for that is pretty expansive so you will be able to find what you need there.

score 1 · Answer 2 · 2024-07-03

1

Entering edit mode

5 months ago

Matthias Zepper 5.0k

There may be better methods by now available, but the one I know is fRMA. However, mind that this package is only meant to normalize for batch effects. The samples you pool should still be generated according to the same protocol and use the same array type and ideally reader.

ADD COMMENT • link 5 months ago by Matthias Zepper 5.0k