Hi, I am having issues subsetting my DESeqDataSet to compare treatments within just one group of samples in my multi-group experiment.
My coldata contains the following factors: CellType, PatientGroup, and Treatment.
- 2 cell types ($CellType): epithelial cells (EEC) and stromal cells (ESC)
- 4 patient groups ($PatientGroup): A, B, C, D
- 2 treatments ($Treatment): untreated and treated
I used the DESeq2 package in R to analyze differential gene expression following RNA-seq:
#DESeq design formula
dds <- DESeqDataSetFromMatrix(countData = cts_clean,
colData = coldata,
design = ~ CellType+PatientGroup+Treatment)
#setting "untreated" as the reference level and running DESeq
dds$Treatment <- relevel(dds$Treatment, ref = "untreated")
dds <- DESeq(dds)
I want to subset the data and only get the differential expression results when comparing treated to untreated samples within each CellType separately, i.e. identify DEGs between untreated and treated epithelial cells (EEC only). This is the code I used to subset the EEC and ESC samples from the original data set and obtain the separate results:
#subset EEC and ESC samples separately; 66 EEC samples and 69 ESC samples
dds_EEC <- dds[, dds$CellType %in% c("EEC")]
dds_EEC$CellType <- droplevels(dds_$CellType)
dds_ESC <- dds[, dds$CellType %in% c("ESC")]
dds_ESC$CellType <- droplevels(dds_$CellType)
#identify differentially expressed genes using the results function
results_EEC <- results(dds_EEC, contrast=c("Treatment","HPL","SFM"))
results_ESC <- results(dds_ESC, contrast=c("Treatment","HPL","SFM"))
I also tried an alternative line of code for subsetting before using the results function:
dds_EEC <- subset(dds, select=colData(dds)$CellType=="EEC")
dds_EEC$CellType <- droplevels(dds_EEC$CellType)
dds_ESC <- subset(dds, select=colData(dds)$CellType=="ESC")
dds_ESC$CellType <- droplevels(dds_ESC$CellType)
However, when I view the summary of my dds_EEC or dds_ESC, it is still showing all of my samples (total N=135). So for my results, it is still giving me the combined results of both cell types, as if I ran the results function as:
results_wholedataset <- results(dds, contrast=c("Treatment","HPL","SFM"))
Because my samples are distinctly different depending on their cell type, I want to analyze the DEGs separately but still run DESeq on the entire data set (as is recommended in the DESeq2 vignette and FAQs for multiple groups).
Downstream I also want to look at contrasting untreated vs. treated cells per patient group (A, B, C, D) but still within each cell type (EEC or ESC), but I need to get the code to subset the data correctly first before trying a second level of subsetting for individual patient groups.
Can anyone please help identify what I am doing incorrectly with the code?
Thank you!
Have you simply tried running, for example:
, and then generating new results tables?