remove the letter a from a biplot in ggplot
Entering edit mode
6 months ago
Assa Yeroslaviz ★ 1.9k

I'm trying to create a biplot in ggplot2 and manage to do it after a lot of trail and errors, But I can't get rid of the a from the legend. I think the difficulty arrise from the fact, that the data set is not set globally, but inside the geom_segment() command. But I can't find a way around it. The code is below as well as the output.

In the legend I get the a with the colours. I would like to know how I can get rid of the a and how I can set specific colours to this groups.

thanks in advance


data <- matrix(rnorm(100), nrow = 10, ncol = 6)
colnames(data) <- c("gene1", "gene2", "gene3", "gene4", "gene5", "gene6")

pca_result <- prcomp(data, center = TRUE, scale. = TRUE)

scores <-$x)
scores$sample <- rownames(scores)

loadings <-$rotation)
loadings$variable <- rownames(loadings)
loadings$group <- rep(c("group1", "group2", "group3"), each = 2)

explained_variance <- summary(pca_result)$importance[2, ]
percent_var_PC1 <- round(explained_variance[1] * 100, 1)
percent_var_PC2 <- round(explained_variance[2] * 100, 1)

ggplot() +
  # Plot the scores (samples)
  geom_point(data = scores, aes(x = PC1, y = PC2), color = "#0072B2", size = 3) +
  geom_text_repel(data = scores, aes(x = PC1, y = PC2, label = sample), color = "#0072B2", size =2) +

  # Plot the loadings (variables) as arrows
  geom_segment(data = loadings, aes(x = 0, y = 0, xend = PC1*5, yend = PC2*5), 
               arrow = arrow(length = unit(0.3, "cm")), color = "grey") +
  geom_text_repel(data = loadings, aes(x = PC1*5, y = PC2*5, label = variable, color = group), size = 3) +
  # Add axis labels and title
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
        panel.background = element_blank(), axis.line = element_line(colour = "black")) +
  labs(title = "test PCA", 
       x = paste0("PC1 (", percent_var_PC1, "%)"), 
       y = paste0("PC2 (", percent_var_PC2, "%)")

PCA plot

legend biplot ggplot guides • 845 views
Entering edit mode
6 months ago
DGTool ▴ 290

In recent ggplot2 versions, geom_text (and other related function) had the key_glyph parameter which can change what should be displayed in the legend (i.e. so to change it from the a to just the coloured point it would be key_glyph="point"). I don't know if geom_text_repel or geom_segment would accept a similar parameter, but might be something to explore and try out. (From the ggplot2 reference:

Entering edit mode

thanks you, and yes it does accept it. Just changed the line to geom_text_repel(data = loadings, aes(x = PC1*5, y = PC2*5, label = variable, color = group), size = 3, key_glyph = "point") + and it worked.

Entering edit mode
6 months ago
kalavattam ▴ 330

The a in the legend is due to how ggrepel::geom_text_repel() handles the color aesthetic for text labels. You are mapping the color aesthetic to the group in the loadings data, so it automatically creates a legend for this aesthetic. The a appears because the legend represents the text, not the points or arrows.

We can adjust this by doing the suppressing the legend for ggrepel::geom_text_repel().

...and how I can set specific colours to this groups.

We can assign colors to the groups using scale_color_manual().

Finally, if you want a legend that displays point glyphs instead of text (a), you can add a geom_point() layer specifically for the loadings and map the color aesthetic to the group.

I've updated your code example with the above in mind:

#!/usr/bin/env Rscript


set.seed(24)  # Set seed for reproducibility

data <- matrix(rnorm(100), nrow = 10, ncol = 6)
colnames(data) <- c("gene1", "gene2", "gene3", "gene4", "gene5", "gene6")

pca_result <- prcomp(data, center = TRUE, scale. = TRUE)

scores <-$x)
scores$sample <- rownames(scores)

loadings <-$rotation)
loadings$variable <- rownames(loadings)
loadings$group <- rep(c("group1", "group2", "group3"), each = 2)

explained_variance <- summary(pca_result)$importance[2, ]
percent_var_PC1 <- round(explained_variance[1] * 100, 1)
percent_var_PC2 <- round(explained_variance[2] * 100, 1)

p <- ggplot() +
    #  Plot the scores (samples)
        data = scores,
        aes(x = PC1, y = PC2),
        color = "#0072B2",
        size = 3
    ) +
        data = scores,
        aes(x = PC1, y = PC2, label = sample),
        color = "#0072B2",
        size = 2
    ) +

    #  Plot the loadings (variables) as arrows
        data = loadings, aes(x = 0, y = 0, xend = PC1*5, yend = PC2*5), 
        arrow = arrow(length = unit(0.3, "cm")),
        color = "grey"
    ) +
        data = loadings,
        aes(x = PC1 * 5, y = PC2 * 5, label = variable, color = group),
        size = 3,
        show.legend = FALSE  # Suppress the legend for text labels
    ) +

    #  Add points for the loadings with color mapped to group for legend
        data = loadings,
        aes(x = PC1 * 5, y = PC2 * 5, color = group),
        size = 3
    ) +

    #  Add axis labels and title
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.line = element_line(colour = "black")
    ) +
        title = "test PCA", 
        x = paste0("PC1 (", percent_var_PC1, "%)"), 
        y = paste0("PC2 (", percent_var_PC2, "%)")
    ) +

    #  Manually set colors for the groups; change the below colors to
    #+ whatever you want
        values = c("group1" = "red", "group2" = "blue", "group3" = "green")


Login before adding your answer.

Traffic: 2241 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6