Hello,
I'm relatively new to RNAseq and have ok experience with R. I'm having trouble performing DESeq on ComBat_seq processed data.
Background:
I'm comparing my bulk RNAseq data to other published data I have collected. For pre-processing, I aligned my trimmed fastq files with STARaligner and used the built in --quantMode GeneCounts
option to get raw counts for each sample. This outputted a *ReadsPerGene.out.tab file for each file, which I extracted the first (gene name) and second (unstranded counts) columns for analysis. Using DESeqDataSetFromHTSeqCount
, I have successfully produced a PCA plot and heatmap, where it became clear I should not go further without performing batch correction.
I backpedaled and performed batch correction using Combat_seq on the raw counts which now outputted a matrix of corrected raw reads. Now to run DESeq on this, I believe I must now use DESeqDataSetFromMatrix
.
The Problem:
I run DESeqDataSetFromMatrix
as follows:
counts_STAR_batch_corrected <- DESeqDataSetFromMatrix(countData = adjusted_counts,
colData = sample_table_matrix,
design = ~condition,
tidy = F)
This results in the following error:
Error in validObject(.Object) :
invalid class “SummarizedExperiment” object: 1: invalid object for slot "NAMES" in class "SummarizedExperiment": got class "matrix", should be or extend class "character_OR_NULL"
invalid class “SummarizedExperiment” object: 2:
'names(x)' must be NULL or a character vector with no attributes
The input data I have looks as follows:
adjusted_counts: (class is "matrix" "array")
Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7
DDX11L1 0 0 0 0 0 0 0
WASH7P 304 526 414 403 448 420 347
MIR6859-3 0 0 0 0 0 0 0
MIR6859-2 0 0 0 0 0 0 0
MIR6859-1 0 0 0 0 0 0 0
MIR6859-4 0 0 0 0 0 0 0
sample_table_matrix: (class is data.frame)
group condition batch
Sample1 Cell1 2D 2
Sample2 Cell1 2D 2
Sample3 Cell1 2D 2
Sample4 Cell2 2D 2
Sample5 Cell2 2D 2
Sample5 Cell2 2D 2
The first columns were both created with colnames
Any thoughts on how to troubleshoot this or what I'm doing wrong would be greatly appreciated (or alternatively how I can use the output from ComBat_seq
with DESeqDataSetFromHTSeqCount
) would be greatly appreciated.
Are your gene names and sample names their own column or the rownames?
They are rownames and columnnames. When I tried the former it threw an error for inconsistent matrix sizes.
The typical way to deal with this is not to alter the counts, but to just include batch in the design.
Do you mean not to use ComBat_seq? My reasoning is to follow the same SOP as where I got the additional data originally and include my data as an additional batch from what they had (we both had sequencing of some of the same cell types and culture conditions). I'd then follow to see changes based on a new different condition I have to see potential significance and/or corroborate my results.