Question

Batch correction RNA-seq analysis

1

Entering edit mode

4 months ago

ka132 ▴ 10

Hi all,

I have RNA-seq data from a single multifactorial experiment that was run in two batches. I unfortunately had no input into which samples were run at which time, and the groups in each of the batches were completely different (e.g., all biological replicates from groups A-E were run in Batch 1 and all biological replicates from groups F-J were run in Batch 2). There are no samples or groups that were included in both batches (controls were only run with Batch 1).

- The samples cannot be re-run, nor can I repeat the experiment (there is no $ to do so).

I am trying to figure out my options here, since I am in charge of analyzing this suboptimal RNA-seq data. I was thinking of using ComBat-seq to adjust for batch effect without specifying biological covariates (I want to avoid overfitting) and then proceeding with analysis as usual. What are people's thoughts on this approach? I'm not sure what else I can do at this point since my hands are tied in terms of what happened to the samples/data before it got to me.

rna-seq ComBat-seq batch correction • 1.3k views

ADD COMMENT • link updated 4 months ago by LauferVA 4.8k • written 4 months ago by ka132 ▴ 10

2

Entering edit mode

samples were run

It would help if you clarify "run" part further. What exact part of the experiment was done in two batches. Full experiment in two sets or just libraries or just sequencing? Was there a common control in the two batches.

ADD REPLY • link 4 months ago by GenoMax 153k

0

Entering edit mode

Just sequencing, no common control unfortunately.

ADD REPLY • link 4 months ago by ka132 ▴ 10

1

Entering edit mode

There should be no appreciable batch effect because of sequencing, as long as following is true. The same sequencer (or at least type i.e. 2 color, flowcell type) was used for the two batches. Yield of reads per sample is similar. You can track which samples came from which flowcell and check using PCA etc but it would be surprising if there is a batch effect due to sequencing alone.

I assume you are confirming that actual experiment, collection of samples and preparation of libraries was done at the same time by same person using a common protocol.

ADD REPLY • link 4 months ago by GenoMax 153k

0

Entering edit mode

The actual experiment and collection of samples were done at same time. Library prep was definitely done by same person using same protocol, only thing I am not completely certain about is timing of library prep (pretty sure was done at same time but have an email out to confirm that now). Same sequencer should have been used, also confirming that.

ADD REPLY • link 4 months ago by ka132 ▴ 10

0

Entering edit mode

Well, dangit - I just heard back and the same sequencer was used but library prep was done separately. Bummer. Thanks for the assistance.

ADD REPLY • link 4 months ago by ka132 ▴ 10

1

Entering edit mode

Since you are stuck with what you have keep this is as another variable (hopefully it is same as two sequencer batches). Generally people (if from cores/companies) are consistent as long as they are following SOP for preps.

If the samples were collected at different times then ... hope that you can get something usable.

ADD REPLY • link 4 months ago by GenoMax 153k

0

Entering edit mode

this is a problem of perfect separation.

if there are highly analogous data in a public repository, you may have some options, but generally you'll never truly know what a given difference was attributable to

ADD REPLY • link 4 months ago by LauferVA 4.8k

score 2 · Answer 1 · 2025-04-01

2

Entering edit mode

4 months ago

jared.andrews07 ★ 19k

If you have no overlap between batches, there are unfortunately no real options. The experiment is confounded. Consider this an expensive and important lesson.

Best you can do at this point is analyze the data, treat all the results with skepticism, and attempt to validate stuff with the risk that you may be chasing ghosts.

ADD COMMENT • link 4 months ago by jared.andrews07 ★ 19k

0

Entering edit mode

yeah.... this is what I was afraid of. I wish this was a lesson that I needed to learn, but again I had no input on the processing up to this point and would have advised differently if given the opportunity.

Do you think it makes more sense to try batch correcting as I mentioned or just analyze the data as is, considering that both options are suboptimal?

ADD REPLY • link 4 months ago by ka132 ▴ 10

1

Entering edit mode

Batch correction as you mention shouldn't even run, I expect ComBat-seq will yell at you about confounded covariates if you try.

But as mentioned by Genomax above, you may not have much of a batch effect if it's just sequencing. I was under the impression the experiment was done in two parts or there was known technical variation.

My answer remains the same though - the inability to assess if there's real technical variation will create a cloud of uncertainty over anything you find until it can be validated experimentally.

ADD REPLY • link 4 months ago by jared.andrews07 ★ 19k

0

Entering edit mode

Unfortunately, I just found out the libraries were prepped separately as well. Thanks for the advice!

ADD REPLY • link 4 months ago by ka132 ▴ 10

score 0 · Answer 2 · 2025-04-01

0

Entering edit mode

4 months ago

swbarnes2 15k

Just to emphasize, running on the same kind of instrument at different times does not cause a batch effect.

But doing RNA preps on different days will. Doing library preps on different days will.

ADD COMMENT • link 4 months ago by swbarnes2 15k