Hi all.
I am analyzing some proteomics data in cell lines before and after the silencing of certain genes and at different time points. A PCA (see attached) shows no clustering of my samples that come from the same treatment. The samples were analyzed in the same batch, so batch effects cannot be the reason why I am not seeing what I would expect.
Would it be possible to go on with a differential expression analysis given that my replicates are not clustering together? Any suggested analysis that could help me identify if that would be possible?
The colors represent Condition + Timepoint.
Thank you for your comment. Indeed, I guess I just have to try!
Looks like you are using PCAtools (my package)? If you use
plotloadings()
, you can see which genes are 'driving' the variation along each PC.Also be wary of using
scale = TRUE
or scale = FALSE withPCAtools::pca()
. I would preferscale = FALSE
.Yes, I am using PCAtools. (Since we are here, I have to say that even if I discovered it pretty recently, your package quickly became my go-to package for PCA. Thank you!)
Thanks for the tip. Would you mind elaborating a bit why
scale = FALSE
is preferable? Thank you!Hi, thanks for the comment regarding PCAtools. Regarding scaling, there was another recent discussion on this, here: C: Scale and Center [normalized] RNA-seq expression counts for PCA ?
Scaling is neither recommended by Michael Love (DESeq2 developer), although, I cannot find thee post where he mentions this.
In a nutshell, the PCA formula is fundamentally based on variation and covariation; by scaling, we 'disrupt' (break) the natural [true] variation that may exist in our data. A Full-Time statistician would obviously give a more technical answer.