I want to merge three Affymetrix datasets (GSE189788, GSE141804, GSE21942) and perform a combined data analysis to identify DEGs. Kindly suggest an R library package to use for the same
Thanks in advance!
I want to merge three Affymetrix datasets (GSE189788, GSE141804, GSE21942) and perform a combined data analysis to identify DEGs. Kindly suggest an R library package to use for the same
Thanks in advance!
Something like this could get you started:
library(GEOquery)
GSE189788 <- getGEO("GSE189788")
GSE189788_eset <- GSE189788$GSE189788_series_matrix.txt.gz
GSE141804 <- getGEO("GSE141804")
GSE141804_U133A_eset <- GSE141804$`GSE141804-GPL96_series_matrix.txt.gz`
GSE141804_U133A2_eset <- GSE141804$`GSE141804-GPL571_series_matrix.txt.gz`
GSE21942 <- getGEO("GSE21942")
GSE21942_eset <- GSE21942$GSE21942_series_matrix.txt.gz
GSE189788_eset@annotation
GSE141804_U133A_eset@annotation
GSE141804_U133A2_eset@annotation
GSE21942_eset@annotation
common.probes <- Reduce(intersect, list(rownames(GSE189788_eset),
rownames(GSE141804_U133A_eset),
rownames(GSE141804_U133A2_eset),
rownames(GSE21942_eset)))
GSE189788_eset <- GSE189788_eset[common.probes,]
GSE141804_U133A_eset <- GSE141804_U133A_eset[common.probes,]
GSE141804_U133A2_eset <- GSE141804_U133A2_eset[common.probes,]
GSE21942_eset <- GSE21942_eset[common.probes,]
View(pData(GSE21942_eset))
combo_pd <- data.frame("Dataset" = c(rep("GSE189788", ncol(GSE189788_eset)),
rep("GSE141804", ncol(GSE141804_U133A_eset)),
rep("GSE141804", ncol(GSE141804_U133A2_eset)),
rep("GSE21942", ncol(GSE21942_eset))),
"Group" = c(rep("MS", ncol(GSE189788_eset)),
gsub("PBMC from ", "", GSE141804_U133A_eset$source_name_ch1),
gsub("PBMC from ", "", GSE141804_U133A2_eset$source_name_ch1),
ifelse(gsub("disease state: ", "", GSE21942_eset$characteristics_ch1.1)=="healthy", "HC", "MS")),
"Age" = c(GSE189788_eset$`age(years):ch1`,
GSE141804_U133A_eset$`age (years):ch1`,
GSE141804_U133A2_eset$`age (years):ch1`,
rep(NA, ncol(GSE21942_eset))),
"Sex" = c(GSE189788_eset$`gender:ch1`,
GSE141804_U133A_eset$`gender:ch1`,
GSE141804_U133A2_eset$`gender:ch1`,
rep(NA, ncol(GSE21942_eset))))
combo_pd$Age <- as.numeric(combo_pd$Age)
combo_pd$Sex <- substr(toupper(combo_pd$Sex), 1, 1)
library(Biobase)
combo_eset <- ExpressionSet(cbind(exprs(GSE189788_eset),
exprs(GSE141804_U133A_eset),
exprs(GSE141804_U133A2_eset),
exprs(GSE21942_eset)))
pData(combo_eset) <- combo_pd
library(Rtsne)
tsne.res <- Rtsne(t(exprs(combo_eset)),
perplexity = 15,
theta = 0,
check_duplicates = FALSE)$Y
gg_df <- data.frame("TSNE1" = tsne.res[,1],
"TSNE2" = tsne.res[,2],
"Dataset" = combo_pd$Dataset,
"Group" = combo_pd$Group)
library(patchwork)
library(ggplot2)
g1 <- ggplot(gg_df,
mapping = aes(x = TSNE1, y = TSNE2, colour = Dataset)) +
geom_point() +
theme_bw() + theme(aspect.ratio = 1) +
scale_colour_manual(values = pals::brewer.set1(3))
g2 <- ggplot(gg_df,
mapping = aes(x = TSNE1, y = TSNE2, colour = Group)) +
geom_point() +
theme_bw() + theme(aspect.ratio = 1) +
scale_colour_manual(values = pals::brewer.dark2(5))
g1 + g2
You have a fair amount of batch effect which you will need to address before you do any differential expression. Realistically, you should analyse each dataset separately if possible.
Once you have that addressed you can use limma
for differential expression and the vignette for that is pretty expansive so you will be able to find what you need there.
There may be better methods by now available, but the one I know is fRMA. However, mind that this package is only meant to normalize for batch effects. The samples you pool should still be generated according to the same protocol and use the same array type and ideally reader.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.