Question

CombatSeq on a large dataset before running DSeq2 on a very small subset?

0

Entering edit mode

16 months ago

ecamenen • 0

I have an RNA-Seq counting table containing approximately 400 patients with similar diseases, and these patients were sequenced in four batches. (Among the batches, we have noticed a strong effect in batch 4 and a "moderate" effect in batch 1.) Our primary focus in the working group is on a single disease, and within this disease, I am particularly interested in subgroups that have received different treatments. Let's say that, one of my analysis groups consists of only five individuals at two different time points. (I am well aware that with such a small sample size in this example, the analysis is underpowered). My goal is to identify differentially expressed genes.

The challenge I'm facing is whether to include the batch effect in the design of DSeq2. This approach may not yield the best results since it could lead to poor estimation (e.g., two samples from batch 1 and one from each of the other batches). In such cases, the inter-individual effect might be confounded with the batch effect.

To overcome this challenge, I'm considering using CombatSeq on the entire initial population before conducting DSeq2 analysis on the subgroup of interest. By doing so, I hope to mitigate the impact of batch effects and improve the accuracy of the differential gene expression analysis within the subgroup. What do you think ?

I apologize if my post is not clear enough; this is my first time posting here. Your insights and suggestions would be highly appreciated. Thank you.

combat RNA-Seq batch-effect DEseq2 DGE • 829 views

ADD COMMENT • link 15 months ago by ecamenen • 0

0

Entering edit mode

I've never used ComBat-Seq, but I think that two approaches with more widely used tools could generally go as follows:

Pass batch into DeSeq2 design matrix, find logFC (for comparison of interest) and unadjusted p-value for your genes of interest, and simply report those.
Use ComBat to get a normalized matrix and then simply plot the values for your genes/samples of interest by group. I don't think its good practice to perform any statistics here, but you say this work is exploratory so this could give you an intuitive feel for what is happening.

ADD REPLY • link 16 months ago by bkleiboeker ▴ 370

0

Entering edit mode

Hi there and thank you for your reply!

In my situation, I'm looking for an alternative approach due to the limited number of samples I have. This number of samples makes it quite impossible to accurately predict the effect of 4 batches in some of my tiny datasets, which can consists for example of comparing five individuals in two different conditions, using DSeq2. This design presents the risk of confounding the batch effect with the between-individual variability (I am fully aware of the statistical power limitations inherent in this virtual analysis).

Sample	Batch	Time
S1	B1	T1
S1	B1	T2
S2	B1	T1
S2	B1	T2
S3	B2	T1
S3	B2	T2
S4	B3	T1
S4	B3	T2
S5	B4	T1
S5	B4	T2

Nevertheless, there is evidence of batch effects throughout the 400-patient cohort (as observed by UMAP, MDS, SOM and similar methods). Therefore, we would like to address this effect during the pre-treatment phase, if possible, for a prediction with more statistical power. It should be noted that our dataset includes RNASeq data, as opposed to microarray data, which influenced our selection of CombatSeq.

I understand that conventional wisdom urges incorporating any covariate into a single mean comparison test rather than controlling for that effect in a prior model. If no other recourse is available, I would perform a standard DSeq2 analysis without considering the batch in the design matrix. However, given my background as a statistician rather than a bioinformatician, I'm curious if you've encountered a similar case for example where it was possible to perform batch preprocessing on an entire cohort before running DSeq without considering the batch on a small subset.

I hope my problem statement is clear enough and thank you for your time !

ADD REPLY • link 15 months ago by ecamenen • 0

Sample	Batch	Time
S1	B1	T1
S1	B1	T2
S2	B1	T1
S2	B1	T2
S3	B2	T1
S3	B2	T2
S4	B3	T1
S4	B3	T2
S5	B4	T1
S5	B4	T2

Sample	Batch	Time
S1	B1	T1
S1	B1	T2
S2	B1	T1
S2	B1	T2
S3	B2	T1
S3	B2	T2
S4	B3	T1
S4	B3	T2
S5	B4	T1
S5	B4	T2

Sample	Batch	Time
S1	B1	T1
S1	B1	T2
S2	B1	T1
S2	B1	T2
S3	B2	T1
S3	B2	T2
S4	B3	T1
S4	B3	T2
S5	B4	T1
S5	B4	T2