Question

Help with multiple batch effects

0

Entering edit mode

7.1 years ago

fp89 ▴ 30

Hello,

I have an expression matrix of 1208 samples (1095 tumor and 113 normal) downloaded from TCGA. I know there are 3 batch effects: type, plateId and TSS. I've tried to correct for them with Combat but I need a little help with the model.matrix.

batch<-as.data.frame(cbind(samples,plateId,group,TSS),as.is=T)[,-1]

#correct for group
mod.1<- model.matrix(~plateId+TSS, data=batch)
bat.1<- ComBat(dat=dati, batch$group, mod.1, mean.only = TRUE, par.prior=TRUE, prior.plots=FALSE)

## correct for plateId
mod.2<- model.matrix(~group+TSS, data=batch)
bat.2<- ComBat(dat=bat.1, batch$plateId, mod.2, mean.only = TRUE,par.prior=TRUE, prior.plots=FALSE)

## correct for TSS
mod.3<- model.matrix(~group+plateId, data=batch)
bat.3<- ComBat(dat=bat.2, batch$TSS, mod.3, mean.only = TRUE,par.prior=TRUE, prior.plots=FALSE)

There is something wrong. The error message says:

Error in ((dat - t(design %*% B.hat))^2) %*% rep(1/n.array, n.array) : 
  requires numeric/complex matrix/vector arguments

Is there anyone who can help me? I'm a student. Thanks in advance.

sva combat batch-effect • 5.6k views

ADD COMMENT • link updated 15 months ago by Ram 45k • written 7.1 years ago by fp89 ▴ 30

score 1 · Answer 1 · 2018-06-28

1

Entering edit mode

7.1 years ago

Kevin Blighe 89k

Going by the numbers, looks like the breast cancer TCGA data. I have analysed this data many times and never noticed an effect of type, plateId, or TSS on the expression values. What evidence do you have that suggests they are biasing the counts?

To adjust for batch effects, please avoid the use of ComBat at all costs. You have a couple of options:

model the batch effects by including these as covariates in the design formula. This will then adjust the test statistics accordingly but not directly modify the counts.
remove batch effects directly with removeBatchEffect() (from limma). There are many postings on the Bioconducor form regarding the usage of this function, like this: Question: Is the following a correct usage of Limma's removeBatchEffect() for clustered heat map generation?

Kevin

ADD COMMENT • link 7.1 years ago by Kevin Blighe 89k

0

Entering edit mode

Hi Kevin, thank you. This page mdanderson suggests different batch types.

ADD REPLY • link 7.1 years ago by fp89 ▴ 30

0

Entering edit mode

Hey, fair enough. It's just not something that I have seen anyone else doing. If you want to adjust for a batch effect, though, first you should check that the effect exist. It may very well not exist, or exist in complex ways that can only be remedied by improving the study design. Batch effects that affect samples unequally are obviously more difficult to model and adjust.

ADD REPLY • link 7.1 years ago by Kevin Blighe 89k

0

Entering edit mode

even i had this issue for rna seq data so i did with svaseq as there is nearly no change in the data even after removing batch effect so what i understand in rna-seq the effect is not much i guess..

ADD REPLY • link 7.1 years ago by 1769mkc ★ 1.3k

0

Entering edit mode

Hi, I'm a bit confused. How can I detect the presence of batch effects? With PCA ok but how can I interpret the graph? This is my pca . Red tumor and blue normal samples.

ADD REPLY • link 7.1 years ago by fp89 ▴ 30

0

Entering edit mode

i would suggest go for unsupervised clustering this figure looks very confusing

ADD REPLY • link 7.1 years ago by 1769mkc ★ 1.3k

0

Entering edit mode

When I saw your figure, I said 'Ouch...!' - it does look a bit messy, but it's just due to the labels.

When I look closer, I do not see anything unusual: The 11 (blue) samples are normal tissue, whilst the 01 (red) samples are tumours (assuming your are using 11 and 01 to refer to the TCGA barcodes). So, nothing looks unusual - I see this same distribution for each and every TCGA dataset that I analyse.

A batch effect could be inferred from PCA if there is a large proportion of variation explained on PC1. The proportion of difference could be upward of 90%.

ADD REPLY • link 7.1 years ago by Kevin Blighe 89k

0

Entering edit mode

thank you...These are my clustering for group, plateId and TSS.

ADD REPLY • link 7.1 years ago by fp89 ▴ 30

0

Entering edit mode

Thanks for sharing and well done! - those are pretty cool dendrograms. Also, apologies if my comment (the 'Ouch...!' part) was interpreted in a negative light. I still don't see any major reason for doing adjustments based on either of these (group, plateid, TSS). The group is different because those are normal tissue samples, so, they are expected to be different. What do you think?

ADD REPLY • link 7.1 years ago by Kevin Blighe 89k