I have RNAseq data for the following conditions (2 replicates for each):
DMSO-treated (control guide), Inhibitor-treated (control guide), DMSO + KO (two different sgRNAs), Treat. + KO (two different guides).
I have analyzed the data so far using DESeq and using this kind of grouping (by treatment and sgRNA):
metaData
treatment sgRNA genotype grouped
controlA_DMSO DMSO control WT control_DMSO
controlB_DMSO DMSO control WT control_DMSO
controlA_Treat. 1uM control WT control_1uM
controlB_Treat. 1uM control WT control_1uM
guide1A_DMSO DMSO 1 KO 1_DMSO
guide1B_DMSO DMSO 1 KO 1_DMSO
guide1A_Treat. 1uM 1 KO 1_1uM
guide1B_Treat. 1uM 1 KO 1_1uM
guide2A_DMSO DMSO 2 KO 2_DMSO
guide2B_DMSO DMSO 2 KO 2_DMSO
guide2A_Treat. 1uM 2 KO 2_1uM
guide2B_Treat. 1uM 2 KO 2_1uM
And the following design:
dataSet <- DESeqDataSetFromMatrix(countData = counts, colData = metaData, design = ~ grouped)
Then I would get the results table for each contrast of interest, for example:
results <- results(DESeq, contrast = c("grouped","control_1uM","control_DMSO"), alpha = 0.05) ... etc
The thing is, we would be interested in the combined effect of KO of protein X + pharmacological inhibition of protein Y, however we have tried to knock-out X with two different guides, which yield different sets of DEG (guide 1 gives us more DEG, although I'm not sure if these are spurious results due to non-specific CRISPR cutting).
For the above heatmap, I called the DEG for contrast = c("grouped","1_1uM","control_DMSO") and contrast = c("grouped","2_1uM","control_DMSO"), then got the common set of DEG from both results tables to do the plotting.
I was wondering if there would be a better approach to do the DEG analyses regarding to KO, and what kind of design and test (Wald or LRT) would be best. I thought of the following approaches, but I'd like to hear other people's thoughts on them:
- Would it be best to just select one of the sgRNAs and do the analyses based on them (excluding samples that used the other guide)? If so, how could we look at our data to decide which guide gives the more confident/less spurious DEG results? (by the heatmap above, the two guides seem to be somewhat consistent, but they differ in a few subsets of genes)
- How could I build the design formula in a way that tests for DEG related to KO and KO + treatment while controling for differences between the two sgRNAs used?
- If I pool both guide RNAs by grouping by treatment and genotype (instead of treatment and sgRNA), like seem below, would the dispersion estimates and underlying statistical testing in DESeq2 be able to yield DEG more consistent with actual KO (while making the non-specific/spurious DEG from each guideRNA to have higher adjusted p-values, for example)?
grouped
WTDMSO
WTDMSO
WT1uM
WT1uM
KODMSO
KODMSO
KO1uM
KO1uM
KODMSO
KODMSO
KO1uM
KO1uM
I was thinking of going with the latter option (pooling the samples treated with different guides), but I would like to hear what other people think.