how to improve PCA visualization using ggplot?
1
1
Entering edit mode
3.1 years ago
mthm ▴ 50

I have used this piece of script to draw the PCA based on eigenvalue and eigenvector

percentage <- round((eigenval/(sum(eigenval))*100), 2)
percentage <- as.matrix(percentage)
percentage <- paste0(names(percentage), " (", percentage, "%)")
Names <- c ("mn27hd", "mdkk987", "mnsdnu83", "sjednu83", "bjeo972s")
pop.colour <- c("blue", "red", "green", "orange", "brown")
ggplot(eigenvec, aes(x=PC1, y=PC2, colour=pop.colour, label=Names)) +
    geom_point(size=3) + geom_text(aes(label=Names), hjust=0, vjust=-0.5) + xlab(percentage[1]) + ylab(percentage[2])

enter image description here

how can I:

  1. change the xlab and ylab names to show as: PC1 (36.61%), PC2 (33.7%)?

  2. move only the pink label slightly left-side cause when I do it generally then the two left labels go outside the image or too close to the edge?

  3. change the legend name and categories from the colour to the population name?

eigenvalue R eigenvector PCA ggplot • 2.9k views
ADD COMMENT
5
Entering edit mode
3.1 years ago
Sam ★ 4.8k

I like to use labs instead of xlab and ylab and add the information to the data.frame directly. Do you only have 4 data point and you are certain that the name are corresponding to the correct PC value? If so do

percentage <- round((eigenval/(sum(eigenval))*100), 2)
percentage <- as.matrix(percentage)
percentage <- paste0(names(percentage), " (", percentage, "%)")
eigenvec$Pop <- c ("mn27hd", "mdkk987", "mnsdnu83", "sjednu83", "bjeo972s")

pop.colour <- c("blue", "red", "green", "orange", "brown")
ggplot(eigenvec, aes(x=PC1, y=PC2, colour=Pop, label=Pop)) +
    geom_point(size=3) + geom_text(hjust=-0.1, vjust=-0.5) +
    scale_color_manual(values = pop.colour) + 
    labs(x=paste0("PC1 (",percentage[1],")"), y= paste0("PC2 (",percentage[2], ")"), color = "Population")

I don't know have mock data to play with so there might be bug to the code. You can modify hjust in geom_text to adjust the horizontal shift

ADD COMMENT
0
Entering edit mode

Thanks the legend worked. the PC1 and PC2 axes didn't! it says:

Error in "PC1 (" + percentage[1] : 
  non-numeric argument to binary operator

that is how my percentage looks like

> percentage
[1] " (36.61%)" " (33.7%)"  " (19.75%)" " (18.03%)" " (-0.1%)"

I can only adjust the hjust=0.4 without throwing the two green and blue labels out of the graph! which is still not enough for the purple label to be fully in! I was thinking if it is possible to change the dimensions of the graph so that labels fit in?

ADD REPLY
0
Entering edit mode

My bad, should have used , instead of +. Have now fixed that.

If you don't mind installing an additional package, I found ggrepel really useful in this scenario. You can find examples here

ADD REPLY
0
Entering edit mode

Great, fixed the label with this and worked fine (x=paste0("PC1" ,percentage[1]), y= paste0("PC2" ,percentage[2])

ADD REPLY

Login before adding your answer.

Traffic: 1822 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6