I am prefacing this post with the fact that I am brand new to programming/coding, much less bioinformatics, RNA-Seq, R, and DESeq2.
My colData is as follows:
sample ID animalID genomicID collectionDate
324 S23-033 24176X12 W0
354 S23-033 24176X13 W1
381 S23-033 24176X10 W2
425 S23-033 24176X15 W3
451 S23-033 24176X14 W4
507 S23-033 24176X16 W7
548 S23-033 24176X11 W10
577 S23-033 24176X9 W12
637 S23-067 24176X27 W1
686 S23-067 24176X31 W2
I am able to make DESeqDataSetFromMatrix
objects for all of the columns, but am only able to successfully run differential expression analysis for ddsAnimalID
and ddsCollectionDate
. I believe this is because these two columns have repeat data inputs (i.e. "S23-033" and "W0").
This theory is affirmed by the fact that, when I attempt to run ddsSampleID <- DESeq(ddsSampleID)
or ddsGenomicID <- DESeq(ddsGenomicID)
, I receive the following message for both:
Error in checkForExperimentalReplicates(object, modelMatrix) :
The design matrix has the same number of samples and coefficients to fit,
so estimation of dispersion is not possible. Treating samples
as replicates was deprecated in v1.20 and no longer supported since v1.22.
Based on my own research, it is the "lack of replicates" that is causing this issue. However, I do not understand why a lack of replicates would be an issue, nor do I know how I would go about fixing this error. I explored the DESeq2 vignette, finding and trying the collapseReplicates
function, but found no success or further information that would help.
There is clearly something wrong with my colData
, but I am stuck on how I should go about rectifying this issue. Any help on what to do will be greatly appreciated. In addition, if any supplemental explanation/information can be given, as bioinformatics is a bit esoteric, would also be incredibly helpful.
Thank you in advance!
What comparison do you want to make with these data? From there lets go forward.
I actually ended up figuring it out! Thank you for your assistance, nonetheless.