Hi everyone, I'm performing analysis of some RNAseq samples, and currently trying to cope with batch effect.plotting PCA of vsd transformed data, I can clearly see two batches which are differ fromt the others.
plotPCA(vsd, intgroup = c('batch'),ntop= 34085)
If I'm not mistaken, then for DE expression analysis I could use design formula of dds function, to reduce batch effect:
dds <- DESeqDataSetFromMatrix(countData = counts, colData = coldata, design =~ batch + type)
dds <- DESeq(dds)
vsd <- varianceStabilizingTransformation(dds, blind = TRUE)
But I also would like to visualise results of batch correction. For PCA plot I could simply follow the DESeq2 manual and just perform limma batch effect removal:
mat <- assay(vsd)
mat <- limma::removeBatchEffect(mat, vsd$batch)
assay(vsd) <- mat
plotPCA(vsd, intgroup = c('batch'),ntop= 34085)
Which gives me a nice results:
But now I'd like to visualise expression of individual genes of interest. I could use vsd transforemd data, but these numbers mask the actual count numbers, therefore laking info on actual level of gene expression, which I'd like to preserve.
So what would be the better solution here? Could I use limma removeBatchEffect() on normalised count table, maybe?
Thanks!
Please see How to add images to a Biostars post to add your images properly. You need the direct link to the image, not the link to the webpage that has the image embedded (which is what you have used here)
What do you mean by
but these numbers mask the actual count numbers
?Hi! I meant that after vsd transformation, genes with let say 1000 and 10 reads will be at the same scale. Nevertheless, the gene with 1000 reads was, probably, detected with higher accuracy, so I'd like to preserve such info. May be for this I should somehow switch to TPMs, but I'd like to avoid it for analysis transparence.
Eugene
Hi Eugene,
how did you generate the box plot from the vsd data? thanks,
Looks like custom code using ggplot2
something like this (given matrix with row - gnes, sample - columns):
Hi! I want to know how you made this boxplot, I saw the script that you used for that but a couldn't get some things. I guess that you converted vst object into matrix called "matrix", but what means "masker" and "row" and also that thing of "deseq_res = FALSE" in the script? I would appreciate if you could explain this for me, I am new in this! Thanks a lot! n_n
matrix - is either count matrix or Deseq object - if later is true - set deseq_res = True
row - gene name you whant to plot 'RAG1'
masker - vector of the length == number of columns in your matrix, which contain group names you whant to see on a chart - c('contr', 'contr', 'contr', 'exp', 'exp') for example (in given example your matrix might have the following colnames: c('control_1', 'control_2', 'control_3', 'experiment_1', 'experiment_2')