Subsetting DESeq data set to compare treatments within one group (multi-group experiment)
1
1
Entering edit mode
18 months ago

Hi, I am having issues subsetting my DESeqDataSet to compare treatments within just one group of samples in my multi-group experiment.

My coldata contains the following factors: CellType, PatientGroup, and Treatment.

  • 2 cell types ($CellType): epithelial cells (EEC) and stromal cells (ESC)
  • 4 patient groups ($PatientGroup): A, B, C, D
  • 2 treatments ($Treatment): untreated and treated

I used the DESeq2 package in R to analyze differential gene expression following RNA-seq:

#DESeq design formula
dds <- DESeqDataSetFromMatrix(countData = cts_clean,
  colData = coldata,
  design = ~ CellType+PatientGroup+Treatment)


#setting "untreated" as the reference level and running DESeq
dds$Treatment <- relevel(dds$Treatment, ref = "untreated")
dds <- DESeq(dds)

I want to subset the data and only get the differential expression results when comparing treated to untreated samples within each CellType separately, i.e. identify DEGs between untreated and treated epithelial cells (EEC only). This is the code I used to subset the EEC and ESC samples from the original data set and obtain the separate results:

#subset EEC and ESC samples separately; 66 EEC samples and 69 ESC samples
dds_EEC <- dds[, dds$CellType %in% c("EEC")]
dds_EEC$CellType <- droplevels(dds_$CellType)

dds_ESC <- dds[, dds$CellType %in% c("ESC")]
dds_ESC$CellType <- droplevels(dds_$CellType)

#identify differentially expressed genes using the results function
results_EEC <- results(dds_EEC, contrast=c("Treatment","HPL","SFM"))
results_ESC <- results(dds_ESC, contrast=c("Treatment","HPL","SFM"))

I also tried an alternative line of code for subsetting before using the results function:

dds_EEC <- subset(dds, select=colData(dds)$CellType=="EEC")
dds_EEC$CellType <- droplevels(dds_EEC$CellType)

dds_ESC <- subset(dds, select=colData(dds)$CellType=="ESC")
dds_ESC$CellType <- droplevels(dds_ESC$CellType)

However, when I view the summary of my dds_EEC or dds_ESC, it is still showing all of my samples (total N=135). So for my results, it is still giving me the combined results of both cell types, as if I ran the results function as:

results_wholedataset <- results(dds, contrast=c("Treatment","HPL","SFM"))

Because my samples are distinctly different depending on their cell type, I want to analyze the DEGs separately but still run DESeq on the entire data set (as is recommended in the DESeq2 vignette and FAQs for multiple groups).

Downstream I also want to look at contrasting untreated vs. treated cells per patient group (A, B, C, D) but still within each cell type (EEC or ESC), but I need to get the code to subset the data correctly first before trying a second level of subsetting for individual patient groups.

Can anyone please help identify what I am doing incorrectly with the code?

Thank you!

Differential-Expression DESeq2 • 3.5k views
ADD COMMENT
0
Entering edit mode

Have you simply tried running, for example:

dds_ESC <- DESeq(dds_ESC)

, and then generating new results tables?

ADD REPLY
1
Entering edit mode
18 months ago

In general, make a column of ColData that concatenates your experimental elements the way you want them, use that concatenated column in your design, and specify what subset to compare to what with contrasts.

That said, are you sure it makes sense to normalize all the different cell types together? I assume PC1 differentiates them, what % of the overall variance does it explain? If it's more than, say 60%, I'd consider splitting them; the assumptions underlying library size normalization might not hold between cell types.

ADD COMMENT
0
Entering edit mode

Thanks for the suggestion! I could concatenate the experimental elements but that would add an extra factor into the design formula for DESeq, not sure if this is the correct way to manipulate the analysis.

The the PC1 variance for celltype is 28%; I'm able to identify DEGs if I run the DESeq separately for the 2 cell types but I'm debating if this is acceptable compared to normalizing as one data set.

ADD REPLY
1
Entering edit mode

The new column is not added to the design, it replaces the columns it's made from.

If differences between cell types are that small, definitely keep them all together.

Separating them is of course acceptable, but you probably get more power by combining them.

ADD REPLY
0
Entering edit mode

Thanks, will try that!

ADD REPLY

Login before adding your answer.

Traffic: 2506 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6