Dear all,
I downloaded the series matrix file of a single microarray dataset (breast cancer), data were normalized and log-transformed,
is box plot of data. I collapsed multiple probes of the same gene as the single gene using limma::avereps
. the box plot was slightly changed after collapsing data as you can see here:
.
Is this change a matter in your professional view? I used collapsed data to generate a PCA plot based on cancer subtype as you can see here:
.
Could you please let me know if you see any signs of a batch effect in the PCA plot, especially for those samples located at the right corner of the plot (basal subtype)? if yes, please kindly let me know how I can define a batch variable using this information and correct the batch during the analysis?
Many thanks!
Thank you for your response. To be honest, there was not any information regarding batch effect, so I tried to get some idea by PCA plotting based on cancer subtype. Could you please let me know what do you mean by "other variable", other than batch number?
What I mean is just some information of batch. But if you have no such info, then I think there is very little we can do on batch effect estimation and removal. Maybe you can go ahead with downstream analysis.