Hello,
I am analyzing two disting BioProjects together from SRA. My pipeline was: Fastp > salmon > tximport into R
Now I would like to perform batch correction using the combat-seq()
comand from the sva package based on the BioProject
. However, since the tximport data frame is more complex than a simple count-matrix, I am not sure how to do that. Which column of the txi dataframe is DESeq2 using for calculating DE genes?
My idea was to using the following code:
txi <- tximport(files, type = "salmon", tx2gene = tx2gene, ignoreTxVersion = T)
### Batch Correction ###
batch = colData$BioProject
ComBat_seq(txi$counts, batch=batch, group=NULL)
After which I would proceed with DEseq2 as follows:
ddsTxi <- DESeqDataSetFromTximport(txi,
colData = colData1,
design = ~ BioProject + condition)
Could somebody tell me if that is the correct approach?
Thank you in advance.
Are you saying that the
DESeq2
internal batch correction does the job and I dont need theComBat_seq
function?Yes, please read the manual and scan for the buzzword
batch
, this is discussed there. https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.htmlThank you.
If I got it right, the way i formulate my
design
in myDESeqDataSetFromTximport
should be correct, as the BioProject variable contains the batch. Again:Probably yes, would help to see content of
colData1
.