ggplot2 labeling and coloring specific Data points in Scatter Plot
3
0
Entering edit mode
4.7 years ago

Hi All,

My data frame has two columns: Gene names and Z scores. I am using ggplot2 package to make a scatter plot. I was able to create the plot but I don't know how to show only specific gene names on the plot and also how to change the color of the gene names shown.

Here is what I am using:

library(ggplot2)      
Data <- read.csv("Data.csv", header = TRUE)           
ggplot(Data, aes(x = Count, y = Z_Score))

Lets say I want to show genenames "Gene1", "Gene2", "Gene3" on the plot and in "red", "blue", "green" respectively.

Many Thanks, Hamid

ggplot2 scatterplot point labeling • 23k views
ADD COMMENT
0
Entering edit mode

This is definitely a https://stackoverflow.com/ kind of question. By the way, you should bring a concrete example dataset. Your information is not enough.

ADD REPLY
0
Entering edit mode

You can find various generic solutions on stackoverflow (as pointed below) or stdha.com. Basically, you would have to have Gene names as factors and color in aes as color=Gene. However, if you have 1000 genes and you want to give each of them a specific color, you will run into problem of color scale! You will end up choosing some sort of gradient, which will make it harder to understand if you are trying to display specific gene or some sort of continuous (gene expression) information!!

ADD REPLY
3
Entering edit mode
4.7 years ago

I think that Hamid needs to use subset(). A reproducible example, here:

df <- data.frame(
  gene = c('a', 'b', 'c', 'd', 'e'),
  Count = c(1,2,3,4,5),
  Zscore = scale(c(1,2,3,4,5)))

df$genecolour <- rep('black', nrow(df))
df$genecolour[df$gene == 'b'] <- 'firebrick1'
df$genecolour[df$gene == 'c'] <- 'royalblue'
df$genecolour[df$gene == 'e'] <- 'forestgreen'

df

  gene Count    Zscore     genecolour
  a    1       -1.2649111  black
  b    2       -0.6324555  firebrick1
  c    3        0.0000000  royalblue
  d    4        0.6324555  black
  e    5        1.2649111  forestgreen


require(ggplot2)
require(ggthemes)
ggplot(data = df, aes(x = Count, y = Zscore, label = gene), colour = gene) +

  geom_point(size = 15.0, colour = df$genecolour) +

  geom_label(data = subset(df, gene %in% c('b','c','e'))) +

  theme(legend.position = 'none') +

  theme_wsj()

c

Snooker (billiards), anyone?

Kevin

ADD COMMENT
1
Entering edit mode
4.7 years ago
lessismore ★ 1.4k

Using the dataset (without predefined colours) of my master and life coach @Kevin you could do this adding extra geom_point layers anytime you want. It just depends on how many genes you want to highlight...

library(tidyverse)
library(ggrepel)

df <- data.frame(gene = c('a', 'b', 'c', 'd', 'e'), 
             Count = c(1,2,3,4,5), 
             Zscore = scale(c(1,2,3,4,5)))

df %>%
    ggplot(aes(Count, Zscore,label = gene)) +
    geom_point() +
    geom_point(data = df %>% filter(gene == "a"), color = "red") +
    geom_point(data = df %>% filter(gene == "b"), color = "blue") +
    geom_point(data = df %>% filter(gene == "c"), color = "green") +
    geom_text_repel()

enter image description here

You can also color based on a specific or a range of values in your Z-score column like here:

df %>%
    ggplot(aes(Count, Zscore,label = gene)) +
    geom_point() +
    geom_point(data = df %>% filter(gene == "a"), color = "red") +
    geom_point(data = df %>% filter(Zscore >= 0), color = "green") +
    geom_text_repel()

enter image description here

ADD COMMENT
1
Entering edit mode

Very good / ¡Muy bien!

ADD REPLY
0
Entering edit mode
4.7 years ago
sgriff13 • 0

ooh I literally did this last night!

You need to include the name of the labels in your data set eg.

newcolumn <- c(rep("Lympho",100),rep("Epithel",100),rep("Erythr",100),rep("Fibro",101))

once you have done this you can set colour of your plot with that column and this will be the legend. Not sure how you specify the actual colours though.

mynewdata <- data.frame(myolddata, newcolumn)
p <- ggplot() +
  geom_point(data = mynewdata, 
             mapping = aes(x = comp1, 
                           y = comp2, 
                           colour = newcolumn,
                           ))
print(p)

or if you already have a column in your data set then just replace 'newcolumn' with that

ADD COMMENT

Login before adding your answer.

Traffic: 4085 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6