Question

How to pre-process raw microarray data?

0

Entering edit mode

13 months ago

egascon ▴ 60

Hello,

I am working with the GSE140829 dataset, which provides microarray expression data in two .txt formats:

Raw data

Normalized data

Number of samples: 587

I am trying to learn how to do my own QC and pre-processing to get to that normalized data, and see if I get similar.

Once I have the metadata in one variable (info.mod) and the expression data in another (data.mod).

enter image description here

I have seen that the expression data comes out in total 2348 samples (2384/4 = 587) that if I understood correctly are 4 channels for each sample of the microarray, how do I go from here?

I have also noticed that the SYMBOL genes are duplicated and triplicated (maybe replicates). What do I have to do to have the expression data for each sample with the SYMBOL variable?

Here is what I have written in R:

#Metadata

GSE140829 <- getGEO('GSE140829', GSEMatrix = TRUE)

varLabels(GSE140829) 

info <- pData(GSE140829) 

info <- info[,c(1,2,41:42)]

#We modify the name to keep only the sample ID and match the ID of the expression data.

info.mod <- 

  info %>%

  mutate(title = gsub("Whole blood,", " ", title)) %>%

  mutate(title = gsub(".*,", " ", title)) %>%

  mutate(title = gsub("\\[|\\]", " ", title)) %>%

  mutate(title = gsub("ad_mci", " ", title)) %>%

  as.data.frame()

#Expression data

data <- read.delim('~/GSE140829_raw_data.txt', header = T)

data[1:10,1:50]

data.mod <- data[,c(2,14:2361)]

And here I am stuck.

My goal would be to change the sample names to the same ID as info.mod but first I need to know what to do with duplicate probes and x4 samples.

The final objective is to make a WGCNA analysis.

Thank you for your help,

limma microarray WGCNA • 442 views

ADD COMMENT • link updated 13 months ago by GenoMax 152k • written 13 months ago by egascon ▴ 60