Question

Adding Percentage in PCA

1

Entering edit mode

7.2 years ago

1769mkc ★ 1.2k

This is my code for PCA using SVD , i get a neat plot , I want to add percatge to the axis I used to get in deseq2 plot im not sure how it adds to it

library(ISLR)
ncidat = (NON_CODING[,-1])
dim(ncidat)
ncidat[1:5,1:16]
X = t(scale(t(ncidat),center=TRUE,scale=FALSE))
View(X)
################
sv = svd(t(X))
U = sv$u
V = sv$v
D = sv$d
###############

## in R calculate the rank of a matrix is by
qr(t(X))$rank
cols = as.numeric(as.factor(colnames(ncidat)))
plot(U[,1],U[,2],type="n",xlab="PC1",ylab="PC2")
text(U[,1],U[,2],colnames(X),col=cols)

par(mfrow=c(1,2))
Z = t(X)%*%V

# plot PC1 vs PC2
plot(Z[,1], Z[,2], type ="n", xlab="PC1", ylab="PC2")
text(Z[,1], Z[,2], colnames(X), col=cols)

pc_dat<- data.frame(type = rownames(Z), PC1 = Z[,1], PC2= Z[,2])
library(ggplot2)
p<-ggplot(pc_dat,aes(x=PC1, y=PC2, col=type)) + geom_point() + 
  geom_text(aes(label = type), hjust=0, vjust=0)

p<-p + theme(text = element_text(size = 25))
p

Any suggestion or help would be highly appreciated how to add percentage on the PC1 and PC2 axis ....

R • 7.0k views

ADD COMMENT • link 7.2 years ago by 1769mkc ★ 1.2k

0

Entering edit mode

Did you try 3.4 The percentage code from http://huboqiang.cn/2016/03/03/RscatterPlotPCA

ADD REPLY • link 7.2 years ago by Sej Modha 5.3k

0

Entering edit mode

i seen that code but i'm not sure how to pass my Principal component I mean which object is storing the principal component .

Can you have a look at my code and let me know ?

ADD REPLY • link 7.2 years ago by 1769mkc ★ 1.2k

1

Entering edit mode

sv$d (D above) values are related to pc standard.deviation. Use formula to convert sv$d values to pc standard deviations. If you are interested in displaying pc percentage, it is better to run PCA instead of SVD.

ADD REPLY • link 7.2 years ago by cpad0112 21k

0

Entering edit mode

okay...so i will give it a try

ADD REPLY • link 7.2 years ago by 1769mkc ★ 1.2k

score 6 · Answer 1 · 2017-12-11

6

Entering edit mode

7.2 years ago

Kevin Blighe 89k

Take a leaf out of my own PCA code: A: PCA plot from read count matrix from RNA-Seq

The formula for converting standard deviations to percent explained variation is:

((project.pca$sdev^2) / (sum(project.pca$sdev^2)))*100

i.e., for each PC's standard deviation, square it, and then divide by the sum of all squared PC standard deviations

ADD COMMENT • link 7.2 years ago by Kevin Blighe 89k

0

Entering edit mode

I m going to use your code would you suggest SVD or PCA ?

ADD REPLY • link 7.2 years ago by 1769mkc ★ 1.2k

2

Entering edit mode

A little known tip: PCA and SVD are more or less the 'same' thing. There are various 'quirks' like this in statistics where different statistical methods end up producing the same results. If you even look at the code for the Base R function prcomp, which performs PCA, it's in fact using the base code for the SVD function.

It is possible to perform PCA using non-SVD methods though.

ADD REPLY • link 7.2 years ago by Kevin Blighe 89k

0

Entering edit mode

I was using this reference since i was looking for ways to find out gene which gives PC1 and PC2 lets say the most variable genes across various sample ,I would like to extract the genes from the PCA and use them for downstream analysis ,is there a simpler way to do this ?

ADD REPLY • link 7.2 years ago by 1769mkc ★ 1.2k

0

Entering edit mode

If you used my code take a look at:

project.pca$rotation

This should contain the rotated component loadings for each gene to each PC, ordered by strength of their association to the PC based on variation.

ADD REPLY • link 6.0 years ago by Kevin Blighe 89k

1

Entering edit mode

great help i will try and let know...

ADD REPLY • link 7.2 years ago by 1769mkc ★ 1.2k

0

Entering edit mode

@Kevin I tried your code and checked project.pca$rotation it contains all the gene i use as i see the dimension of my input gene and project.pca$rotation are same except the non-numeric column , but is that in order?I mean the same as my input list ?

ADD REPLY • link 7.2 years ago by 1769mkc ★ 1.2k

0

Entering edit mode

Each gene should have a value, which is unitless but is a measure of the strength of the gene's relationship with the eigenvector (principal component). Large absolute values should indicate greater covariation between the samples being segregated by the eigenvector in question.

ADD REPLY • link 6.3 years ago by Kevin Blighe 89k

0

Entering edit mode

and how do i extract that? or get the gene from those principal component? is it just manual or method or can i do it a R

ADD REPLY • link 7.2 years ago by 1769mkc ★ 1.2k

1

Entering edit mode

The genes should be the rownames of the data-frame project.pca$rotation ?

ADD REPLY • link 7.2 years ago by Kevin Blighe 89k

0

Entering edit mode

yes it is ....as such

pca$rotation
                                       PC1           PC2           PC3           PC4
5S_rRNA                      -1.090574e-02 -2.412665e-03  1.637689e-02 -3.603865e-02
AB019441.29                  -1.928250e-02  1.821083e-03 -9.713724e-03 -1.978737e-03

but my question is how am i going to find which are the genes that came with PC1 and PC2 or contributing to to the first two Principal component

I have created a new question and the paper link also cited in the question

Extracting features or gene from PCA after calculating PCA for downstream analysis

You can see and let me know .

ADD REPLY • link 7.2 years ago by 1769mkc ★ 1.2k