Question

Removing batch effect in time-series data

0

Entering edit mode

6 months ago

Álvaro • 0

I have a proteomics time-series study with 4 time points. My conditons are healthy untreated, healthy treated, disease untreated and disease treated. The thing is I suspect there might be some batch effect going on since I have found significative differences among a few protein time points in my healthy untreated controls.

However, since my batches coincide with the time points I don't know whether it would be possible to remove that batch effect. I would really appreciate your help on this!

batch-effect • 805 views

ADD COMMENT • link updated 5 months ago by Ram 44k • written 6 months ago by Álvaro • 0

score 0 · Answer 1 · 2024-06-16

0

Entering edit mode

6 months ago

rfran010 ★ 1.3k

Have you performed more formal analysis? PCA, clustering, correlation, etc...

If you have a time dependent batch effect, I would expect that the samples within each timepoint are more similar than the same treatment between timepoints. Is this true?

Also, would the experimental setup make sense with healthy subjects undergoing proteomic changes? For example, if looking at four timepoints in a day, circadian rhythms can cause changes in the control population at each timepoint, not necessarily a batch effect but maybe needs some domain knowledge to provide further context.

If you do have batch effects, then you might use the healthy untreated as the baseline for each timepoint.

ADD COMMENT • link 6 months ago by rfran010 ★ 1.3k

0

Entering edit mode

I performed a clustered heatmap and PCA for all samples and the different timepoints did not group together. However, my doubts come from another clustered heatmap where I represented protein means across samples for every combination of conditions (including timepoints), that is: healthy trated T1, heatlhy treated T2, ... disease untreated T1, ... disease untreated T4. In that heatmap, time points clustered together almos perfectly. Is that a correct approach?

And yes, samples within each timepoint are more similar than the same treatment between timepoints.

About the experimental setup, timepoints were 3 months appart, first one being pre-treatment, I can't think of any other variable which could be causing those changes.

ADD REPLY • link 6 months ago by Álvaro • 0

1

Entering edit mode

It's an inevitable batch. You cannot correct for it. Best thing experimentally would be to do minimal processing on the specimen upon harvest, freeze and process when all samples are available. For computational batch correction there is nothing I see that can be done, other than having a large n, meaning many donors to see what is reproducible. I hope the setup is paired so you only basically compare within donor across time rather than across timepoints without pairing. Pairing is much more powerful than non-pairing.

ADD REPLY • link 6 months ago by ATpoint 86k

0

Entering edit mode

I understand, and yes, my samples are paired. So the seasonal differences are not correctable, but I can stil test for differences by comparing case and control time differences, am I right?

ADD REPLY • link 6 months ago by Álvaro • 0

1

Entering edit mode

If you sampled by season then there is (to me) no other choice than doing it in batches. It's an inevitable batch. It's paired, so that's great and increases power a lot. There is literature that was published high doing seasonal RNA-seq (Nat. Comm 2018, I don't recall the first author) with such a setup. I would go ahead and see what comes out and whether it makes sense. I would say as long as libraries have been prepped the same time (so after collecting all samples) it is the best you can do. Theoretically yes, the harvest timepoints are the same as batch so it's of course confounded, but for such a time-series setup, you cannot avoid it.