Question

Combining microarray expression data

0

Entering edit mode

3.0 years ago

SnehaS • 0

Hello Fellow Scientists,

I have 5 microarray datasets (different platforms). Each dataset had disease and healthy samples. Few datasets had only 4 disease and 3 healthy samples while others had more. I wanted to run ML algorithms on them and since ML requires large number of samples, I was trying to find a way to combine these datasets. Here is what I did, and I would like to know whether this method is correct.

I combined /concatenated expression matrices (gcrma / neqc normalized) of all of them into one by taking common genes measured. I had around 8000 genes as rows and 200 samples as columns.
I used scale() function in R and converted expression values into z scores.
I then used this z scores matrix and few gene signatures as an input for GSVA.
The output for GSVA (gene signatures as rows, samples as columns, enrichment score values between -1 to 1) was used as an input for ML.

Is this method correct? What are some other ways to run ML algorithms on gene expression data? The goal for running ML is to find genes / gene signatures that separate disease from healthy.

Microarray GSVA Z-score • 855 views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 3.0 years ago by SnehaS • 0

score 1 · Accepted Answer · 2022-04-10

1

Entering edit mode

3.0 years ago

Kevin Blighe 89k

Hi SnehaS, I provide some generic guidance here: How to integrate multiple data sets from microarray platform prior meta-analysis?

Kevin

ADD COMMENT • link 3.0 years ago by Kevin Blighe 89k

0

Entering edit mode

Thank you Kevin

ADD REPLY • link 3.0 years ago by SnehaS • 0

0

Entering edit mode

You are welcome, SnehaS

ADD REPLY • link 3.0 years ago by Kevin Blighe 89k