How to adjust by multiple variables using ComBat-Seq?
0
0
Entering edit mode
2.3 years ago
ev97 ▴ 40

I am trying to adjust my RNA data using ComBat-Seq (from sva R package) since I realised that there are 3 batches that I need to adjust for:

  • Place (2 levels: place 1, place 2)
  • Library Preparation Date (16 levels - different dates)
  • Type of tube (2 levels: A, B)

I have 960 samples and around 62000 genes.

In my biological matrix, I have: Age, Sex, Group (cases,controls..) and WBC counts.

biol_mat = model.matrix(~Age + as.factor(Sex) + as.factor(Group) + LYMPH + MONO + NEUT, data=phenotype)

In the tutorial of Combat-Seq appears how to adjust by 1 variable but it doesn't tell you how to adjust by more than 1.

I have seen a lot of posts using combat that the only way is to adjust by 1 variable and then, with those results, adjust again by the 2nd variable and so on.

That would be:

### Adjust by library prep.
raw.cts_adjustedLibPrep <- ComBat_seq(counts = raw_cts_matrix, batch=batch_libraryprep, group=NULL, covar_mod = biol_mat)

### Adjust by library prep + type of tube.
raw.cts_adjusted_LibPrep_TypeTube <- ComBat_seq(counts = raw.cts_adjustedLibPrep, batch=batch_type_tube, group=NULL, covar_mod = biol_mat)

### Adjust by library prep + type of tube + place
raw.cts_adjusted_LibPrep_TypeTube_Place <- ComBat_seq(counts = raw.cts_adjusted_LibPrep_TypeTube, batch=batch_place, group=NULL, covar_mod = biol_mat)

For the first adjustment (library prep) it takes around 15min, but for the second... it has been running for more than 2 days.. I stopped it and launch it again, changing the adjustment but I am not sure if it will work..

Does anybody have an idea about how to fix the problem?

Thanks in advance

CombatSeq combat rna-seq sva batch-effect • 3.1k views
ADD COMMENT
1
Entering edit mode

I do not think that this function has been developed with such large sample size in mind, nor am I sure that sequential adjustment makes sense. With 960 samples are you really going to use GLM approaches rather than just using limma (trend/voom). In any case, for a differential analysis you would add these factors into the design, and for corrected counts you could simply use removeBatchEffect from limma, possibly putting back to normal scale by using 2^x on the returned corrected matrix.

ADD REPLY
0
Entering edit mode

First of all, thanks very much for your reply. In relation to your questions...

I need to run the adjustment for 1) Running variancePartition and see how many and which genes are explained by X variables. 2) Drawing some plots, do some cutoffs and do some downstream analyses.

In relation to variancePartition there is one section in the vignette named as "4.2 Removing batch effects before fitting model" that it may work... but I have been trying with the example that they use in the tutorial and I lose the genes (now it doesn't appear the genes anymore and I need them).

This is the example that I am using:

library('variancePartition')
data(varPartData)
# subtract out effect of Batch with linear mixed model
modelFit <- fitVarPartModel( geneExpr, ~ (1|Batch), info )
res2 <- residuals( modelFit )
# fit model on residuals
form <- ~ (1|Tissue) + (1|Individual) + Age
varPartResid2 <- fitExtractVarPartModel( res2, form, info )

IMAGE 1

You should usually get the gene names as your rownames in varPartResid2 as it gives you in this piece of code:

library('variancePartition')
data(varPartData)
form <- ~ (1|Tissue) + (1|Individual) + (1|Batch) + Age
varPart <- fitExtractVarPartModel( geneExpr, form, info )

image 2

Anyway, what I don't understand is that I can adjust 1 variable, but not for more... And I really need the adjusted counts to do other things, without having the effect of the batches...

I will have to try removeBatchEffect but it only allows you to adjust by 2 batches... so, I will have to use sequential adjustment to adjust by 3 variables... Does it work with rnaseq data? Since limma was made for microarray... I am afraid to get wrong results because of this.

ADD REPLY
1
Entering edit mode

removeBatchEffect can take as many batches as you like: https://support.bioconductor.org/p/128357/

limma was originally developed for microarrays but has underwent extensive work for RNA-seq / digital counts, so that will not be an issue. See limma voom and trend pipelines and the limma manual. The limma RNA-seq paper has > 4k citations.

ADD REPLY
1
Entering edit mode

By the way, you posted this in at least three communities. Nobody can stop you from doing that but etiquette would require that you at least mention the cross-posts to avoid double efforts.

https://bioinformatics.stackexchange.com/questions/19456/how-to-adjust-by-multiple-variables-using-combat-seq

https://support.bioconductor.org/p/9145644/

ADD REPLY
0
Entering edit mode

Sorry, I didn't know if I would get an answer, I will delete the post from the rest of the communities. Thanks very much and sorry for any inconvenience

ADD REPLY
0
Entering edit mode

Leave it, just add a link to the other communities. Different brains will give different answers. I do not pretend that I am the one genuine expert in anything. Better answers may/will exist.

ADD REPLY

Login before adding your answer.

Traffic: 2665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6