Question

Batch effect in the single microarray dataset

0

Entering edit mode

2.9 years ago

seta ★ 1.9k

Dear all,

I downloaded the series matrix file of a single microarray dataset (breast cancer), data were normalized and log-transformed,

enter image description here

is box plot of data. I collapsed multiple probes of the same gene as the single gene using limma::avereps. the box plot was slightly changed after collapsing data as you can see here:

here .

Is this change a matter in your professional view? I used collapsed data to generate a PCA plot based on cancer subtype as you can see here:

here .

Could you please let me know if you see any signs of a batch effect in the PCA plot, especially for those samples located at the right corner of the plot (basal subtype)? if yes, please kindly let me know how I can define a batch variable using this information and correct the batch during the analysis?

Many thanks!

gene-expression batch-effect PCA • 1.1k views

ADD COMMENT • link updated 3 months ago by Ram 44k • written 2.9 years ago by seta ★ 1.9k

score 1 · Answer 1 · 2022-01-10

1

Entering edit mode

2.9 years ago

shiyang_bio ▴ 170

Hi, There is no problem in collapsing probes by limma. From your PCA plot I cannot get information of batch effect. You should color the dot using other variable, such as batch number, but not BC subtype. From this plot you paste here, it seems good as different subtypes cluster together.

Best

ADD COMMENT • link 2.9 years ago by shiyang_bio ▴ 170

0

Entering edit mode

Thank you for your response. To be honest, there was not any information regarding batch effect, so I tried to get some idea by PCA plotting based on cancer subtype. Could you please let me know what do you mean by "other variable", other than batch number?

ADD REPLY • link 2.9 years ago by seta ★ 1.9k

1

Entering edit mode

What I mean is just some information of batch. But if you have no such info, then I think there is very little we can do on batch effect estimation and removal. Maybe you can go ahead with downstream analysis.

ADD REPLY • link 2.9 years ago by shiyang_bio ▴ 170