Hi there,
Essentially, my experimental design is control vs treatment. Cells were sorted based on fluorescence, so there are 4 different "colors" of treated cells, i.e. red, green, green+red, and blue+green+red. I am interested in how the colors differ from one another. And, I have duplicates for all colors and the control.
> colData(dds_no)
DataFrame with 10 rows and 3 columns
sample color sizeFactor
<factor> <factor> <numeric>
CTRL_1 control 0.730176128359336
CTRL_2 control 1.12370593310441
GFP_1 green 1.62229333835717
GFP_3 green 0.733077520973604
GFPRFP_1 greenred 1.24575808750345
GFPRFP_2 greenred 1.27612350159403
RFP_2 red 1.57975191196927
RFP_3 red 0.518878115991023
TRIPLE_1 rgb 0.833793868046399
TRIPLE_3 rgb 1.01334467700869
I am wondering if it is appropriate to remove my control from downstream differential expression analysis when I want to only look at variation between treated cells? When I include the control, the PCA shows that the control samples are well separated from all treated cell types.
What I want (and have done) is removed control from the raw counts, built the deseq2 object with ~color as my design, produced results with LRT instead of Wald, and selected significant DEGs by the adj. pvalue. Then from this, I created a heatmap with my samples (still without control) to show the variation they have with certain significant genes and gene sets.
dds_no = DESeqDataSetFromMatrix(countData=countData_no,
colData=colData_no,
design=~color)
dds_LRT = DESeq(dds_no,test="LRT", reduced=~1)
res_LRT <- results(dds_LRT)
rld_LRT<- rlogTransformation(dds_LRT)
pathsLRT<-assay(rld_LRT)
df_pathLRT <- cbind(rownames(res_LRT), data.frame(res_LRT, row.names=NULL))
topTableLRT <- as.data.frame(df_pathLRT)
sigGeneListLRT <- subset(topTableLRT, padj<=0.05)[,1]
topMatrixLRT <- pathsLRT[which(rownames(pathsLRT) %in% sigGeneListLRT),]
topMatrixPATHSLRT <- gsva(data.matrix(topMatrixLRT),
stress_Cao_etal_2017,
method="gsva",
min.sz=1,
max.sz=Inf,
kcdf="Gaussian",
mx.diff=TRUE,
verbose=TRUE)
heat_pathsLRT <- t(scale(t(topMatrixPATHSLRT)))
pheatmap(heat_pathsLRT, annotation_col=dfdds, fontsize_row = 8, cluster_rows = FALSE, cluster_cols =
TRUE, main="Stress Paths", annotation_legend=FALSE)`
Please let me know if this approach is acceptable or if it truly requires I include the control.
Thanks, Morgan
Ok, that makes a lot of sense. Thanks for helpful advice!