Entering edit mode
10 months ago
CTLong
▴
120
Hi all,
I want to carry out a sanity check to see if my samples have a gene expression profile within the same range (avoid technical variations) before performing differential expression analysis with DESeq2.
To do this, I am thinking about generating a boxplot for each sample, with the y-axis representing gene expression and each point represents a single gene. I had done this previously with Limma-Voom transformed data. Was wondering how can I get the transformed data from DESeq2 to generate a similar plot below? Thanks.
Rather than doing this I suggest you do PCA and then check if you see outliers that can be explained by sequencing depth. If that is true then it suggests that unequal depth is a confounder that cannot be removed by default normalization. If that is the case you might want to subsample data to match the lowest sample, or to exclude samples with very low or very high depth.
As for your question, the closest you can get in DESeq2 in terms of "fitted data" is either normalized counts on log2 scale (normTransform function) or the output of vst or rlog.
Thanks for the reply! I have already done a PCA and the positioning of the samples make sense. To further check for technical variation, guess I'll just do a vst box plot.
For boxplot of gene counts distribution I typically use pseudocounts (i.e. log2(count + 1)) to calculate the relative log expression and plot that distribution. See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5798764/.
I also look at pearson correlation of raw gene counts and kernel density histogram of pseudocounts. But more often than not PCA will be sufficient to reveal technical variation in the data set, these other QC visualization help to corroborate or give some basis for what I see on the PCA plot.
Thanks for the reply! I agree that in most cases PCA should be enough to reveal the technical variation. However, since I suspect that my dataset contains more technical noise than one would expect normally, I think there is a need to be extra cautious. That being said, I have come across the RLE boxplot in previous publications but never really looked into it since it's for data exploration. Thanks for bringing it up, seems like this would be relevant for my study.