Question

Batch correction prior to running DESEQ2

0

Entering edit mode

6 months ago

jayju • 0

I have a large time series data set with multiple conditions for which I'm performing RNAseq and the DGE using DESEQ2. With 100+ samples, I wasn't able to process them all at the same time. So I have several batches.

When adding all of the parameters to DESEQ2 (~run + batch + strain + minute + strain:minute), I get model matrix is not full rank.

To solve the issue, I run the solution given by DESEQ2 developer of nested conditions, etc.

https://bioconductor.riken.jp/packages/3.6/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups

It seems like whatever I do, my model matrix is not full rank. I will try to find a biostatistician to work with, but in the meantime, I was wondering if its a good idea to use another software to do batch corrections and then move to DESEQ2? What is the consensus? Do folks have a specific software they prefer when they do this?

R RNA-seq DESEQ2 DGE transcriptomics • 441 views

ADD COMMENT • link updated 6 months ago by Ram 44k • written 6 months ago by jayju • 0

0

Entering edit mode

Please post the colData (a meaningful subset of it), so one can have a look. Almost certainly there is a suboptimal encoding of your variables.

ADD REPLY • link 6 months ago by ATpoint 86k

0

Entering edit mode

In general, putting samples of the same batch on different instrument runs will not cause technical artifacts. If that is what you mean by "run", you should be able to omit that from your design.

There is no clever trick that will help you if your batches are confounded with your experimental variables.

ADD REPLY • link 6 months ago by swbarnes2 14k