Question

How to find potential batch effect for microarray samples?

0

Entering edit mode

8 months ago

namck • 0

I am trying to analyze a microarray dataset from NCBI Geo (GSE92538, platform ID: GPL10526, GPL17027). I want to perform gene expression analysis for SCZ and Control, Before proceeding for that I want to look at if there is any batch effects since there are potential covariates (such as age, race post mortem interval, brain pH), and correct the batch effect if required.

But looking at the metadata, I found out all the 58 SCZ sample and 176 Control sample have different Platform ID (majorly GPL10526 for SCZ sample, GPL17027 for Control sample), different cohort (schiz_cohort 1,2), different processing location (UCDavis, UCIrvine, UcMichigan), different QC batch (qc_batch 1-7).

I am confused how do i find out the batch effect?

Is it possible that I particulrly go for a platform ID (GPL10526), pick out different cohort (e.g., schiz_cohort_1 for both SCZ and Control samples, schiz_cohort_2 for both SCZ and Control samples etc) and plot a PCA plot to find out the batch effect?

Then how do I define the batch effect? Or should I analyze them in different batches (e.g., qc_batch 1 for both scz and controls etc)?

And repeat the steps for Platform ID GPL17027?

I am confused.

Any help would be appreciated.

Thank you

Gene-expression Batch-effect Microarray • 814 views

ADD COMMENT • link 8 months ago by namck • 0

score 0 · Answer 1 · 2024-09-13

0

Entering edit mode

8 months ago

Malachi Griffith 20k

Here is an example of an exercise from a bioinformatics workshop that attempts to explain the concept of batch effects, identification using PCA, batch correction using COMBAT, and assessment and interpretation of the impact of batch correction.

The accompanying hands on exercise uses an RNA-seq dataset, but the concepts (and batch correction tool) really come from microarray analyses.

I would recommend walking through the exercise step-by-step, reading the explanatory text and reference articles, and thinking about the outcome.

The exercise assumes access to a command line and R.

ADD COMMENT • link 8 months ago by Malachi Griffith 20k

0

Entering edit mode

Hi Malachi Griffith,

Thank you for your help. I performed batch effect based on Geo Platform. I merged the two datasets (DATASET1-GSE91528-GPL10526, DATASET2-GSE91528-GPL17027), and removed batch effects using removeBatchEffect function of limma R package. But the result I got is confusing.

The first UMAP plot (before removing batch effect) seemed quite ok, with little batch effect (a few samples from same platform clustered together). But after removing the batch effect, UMAP plot seemed to have even more batch effect with larger number of samples from same platform clustering together.

I am a bit confused. Did I mess up the whole thing somehow? Then why am I seeing opposite trend? Or am I interpreting the plot wrongly?

Photos are attached for your reference. UMAP plot before batch effect correction

UMAP plot after batch effect correction

ADD REPLY • link 8 months ago by namck • 0