I have a proteomics time-series study with 4 time points. My conditons are healthy untreated, healthy treated, disease untreated and disease treated. The thing is I suspect there might be some batch effect going on since I have found significative differences among a few protein time points in my healthy untreated controls.
However, since my batches coincide with the time points I don't know whether it would be possible to remove that batch effect. I would really appreciate your help on this!
I performed a clustered heatmap and PCA for all samples and the different timepoints did not group together. However, my doubts come from another clustered heatmap where I represented protein means across samples for every combination of conditions (including timepoints), that is: healthy trated T1, heatlhy treated T2, ... disease untreated T1, ... disease untreated T4. In that heatmap, time points clustered together almos perfectly. Is that a correct approach?
And yes, samples within each timepoint are more similar than the same treatment between timepoints.
About the experimental setup, timepoints were 3 months appart, first one being pre-treatment, I can't think of any other variable which could be causing those changes.
It's an inevitable batch. You cannot correct for it. Best thing experimentally would be to do minimal processing on the specimen upon harvest, freeze and process when all samples are available. For computational batch correction there is nothing I see that can be done, other than having a large n, meaning many donors to see what is reproducible. I hope the setup is paired so you only basically compare within donor across time rather than across timepoints without pairing. Pairing is much more powerful than non-pairing.
I understand, and yes, my samples are paired. So the seasonal differences are not correctable, but I can stil test for differences by comparing case and control time differences, am I right?
If you sampled by season then there is (to me) no other choice than doing it in batches. It's an inevitable batch. It's paired, so that's great and increases power a lot. There is literature that was published high doing seasonal RNA-seq (Nat. Comm 2018, I don't recall the first author) with such a setup. I would go ahead and see what comes out and whether it makes sense. I would say as long as libraries have been prepped the same time (so after collecting all samples) it is the best you can do. Theoretically yes, the harvest timepoints are the same as batch so it's of course confounded, but for such a time-series setup, you cannot avoid it.
Got it. Thank you so much! That was a lot of help.