Entering edit mode
2.7 years ago
Aldo
•
0
hello everyone, I'd like to change the colors of my pca plot to better interpret the results. this is my matrix (23240 elements) with normalized data (I reduced data for brevity)
WTYETMM
sample1 sample2
Sox17 18.858747 17.214335
Gm6123 25.144996 28.690559
Mrpl15 544.808244 558.509546
Lypla1 192.778302 148.234554
Tcea1 449.466801 452.354478
Gm6104 18.858747 16.257983
Atp6v1h 320.598698 341.417651
Rb1cc1 296.501410 309.858036
Pcmtd1 550.046785 585.287401
Gm38372 61.814782 61.206526
first of all i normalized the data using the tmm method, then I ran this script:
library(ggfortify)
prcomp= prcomp(WTYETMM, scale.=TRUE)
autoplot(prcomp, loadings = TRUE,
data = WTYETMM)
now i show you the pca i got, but i'd like to show the names of the two samples and color them differently. I appreciate your help. please do not send me links, I would appreciate your comment
There is a lot going on here. First, which kind of data is that? The magnitude of values suggests it is not log-scale, so that would be the first thing to fix. Scaling counts is not really meaningful in most cases since you want to put emphasis on rows that change much between columns and scaling dampens this effect a lot. Also, one would commonly do a PCA row-wise, not col-wise as you do it, so
pca(t(input))
unless the col-wise operation is really intended. Selection of genes that are variable would probably make sense as well as 23240 genes is a lot and only few probably contribute to separation of samples.The mentioned PCAtools does all that for you, even though it is not hard to code that up yourself, but why not using existing packages that do it already for convenience and also provide plotting routines.
A quick comment: you may try my R / Bioconductor package: PCAtools