Batch effect in integrated RNA-seq analysis
2
0
Entering edit mode
21 months ago
Sakura • 0

Hi,

I'm beginner in bioinformatics as PhD candidate in Japan. I am trying to do an integrating bulk RNA-seq analysis using some public transcriptomic data from multiple facilities, but I'm struggling with batch effect correction. When I try to draw heatmap using TPM value, the result shows different gene expression patterns only for samples at a particular facility.

Indeed, these samples were sequenced using a different sequencer at a different facility. Also, these samples were sequenced as paired-ended while the others were single-ended.

Then, I have corrected normalized TPM data by using Combat-seq, but the PCA plots still present artifacts that are clearly due to inter-facility differences.

I would be grateful if someone could give me some advice.

Best wishes

batch-effect RNA-seq Combat • 4.2k views
ADD COMMENT
2
Entering edit mode
21 months ago

RNASeq is very sensitive to batch effects. It is generally not possible to take experiments from different sources and combine them. Batch differences will always obscure, if not blot out, biological differences.

ADD COMMENT
0
Entering edit mode

Thank you for your reply.

I know there are a number of papers reporting integrated RNA-seq analysis using different data sources, but is it still not scientifically valid?

ADD REPLY
0
Entering edit mode

It would depend on the extent of the batch effect and the nature of the measures taken to correct for it. Do you have papers in mind that use a similar design to what you are hoping to accomplish?

Studies such as https://www.science.org/doi/10.1126/science.aat8127 explicitly include batch as a covariate when testing for DE; with both cases and controls present in every batch. This corrects for any mean differences between batches. For visualization, this model (less the binary outcome of interest) is used as a regression for each gene; and the residuals are visualized. Note that in cases where there is stratification between outcome and batch (i.e., differences in group proportions between batches), some of the group effects will load onto the batch effect, and be removed by this approach.

ADD REPLY
0
Entering edit mode

Dear Dr LChart,

Thank you for your helpful suggestion. Could you let me know more about "Studies such as https://www.science.org/doi/10.1126/science.aat8127 explicitly include batch as a covariate when testing for DE"

Is it different from batch correction like Combat-seq ?

BW

ADD REPLY
1
Entering edit mode

They include a batch variable in their regression model model like

expression ~ disease + BATCH + [other covariates]...

ADD REPLY
0
Entering edit mode

Thank you really, doctor. So, when they calculate expression ratios (m-value, p-value) from expression value such as TPM, they use a regression model that includes the batch as a covariate?

I want to know how to define the batch covariate here, as I could not fine the specific calculation formula in the paper above.

Or may I ask for your direct contact if you would not mind.

Best,

ADD REPLY
0
Entering edit mode

You mention seeing stratification by facility. If you're performing the analysis in R, you can use a variable that stores the facility names such as:

sample_number    sex   facility        group
            1      M      lab_A      Treated
            2      M      lab_A      Control
            3      F      lab_B      Control
            4      F      lab_C      Treated

(etc); and you could regress expr ~ group + sex + facility + 0

to see group effects controlling for sex and facility effects.

ADD REPLY
0
Entering edit mode
21 months ago
John Ma ▴ 310

You mean bulk, or single-cell? The rule of thumb is for single-cell, there'd be batch effects between runs of the microfluidics instrument. For that, I usually use scvi-tools, mostly because I frequently analyse scRNA-seq + CITE-seq data. Seurat also does this using different algorithms.

ADD COMMENT
0
Entering edit mode

Thank you, Mrs. I mean BULK rna-seq using public transcriptomic data.

ADD REPLY

Login before adding your answer.

Traffic: 1522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6