Select specific values from a table
0
0
Entering edit mode
2.9 years ago

Hello

I have been using ggplot2 to plot two sets of values from a data table called "TableFCvsFC". The two parameters that I am drawing are values from the column "SM" vs values from the column "SPL", which correspond to the same gene expression evaluated in different conditions. As I don't know how to import the data table to this post, I paste an image of a fragment of the table as an example:

enter image description here

So far I have used the next code to draw the following graph:

library(ggplot2)

Change <- openxlsx::read.xlsx("/Users/Desktop/TableFCvsFC.xlsx")
LC <- ggplot(Change, aes(x=SM, y=SPL, family = "sans"))

LC + geom_vline(xintercept = 1e+00) + 
  geom_hline(yintercept = 1e+00) + 
  geom_abline(intercept = 0, slope = 1) +
  geom_point(size=2, shape=16, alpha = 1, color = "black") +
  scale_x_continuous(trans = "log10") + 
  scale_y_continuous(trans = "log10")

enter image description here

What I need now is to highlight the dots located inside the orange rectangle by coloring them in red, while the rest of the dots should remain black. The particular characteristic of the dots to be red is that the value in "SM" column should be higher than 2, while the value in "SPL" column should be lower than 1. I suppose that one thing that I can do is to create a new vector with the rows that meet those requirements and then customize the colour according to that variable, but I don't know how write that part of the code.

What I also would like to do is to add a label to the 10 red dots with the highest values in "SM". Those labels should contain the corresponding "GeneName" for each of those dots.

Can anybody help me? As I do not provide the complete data table (sorry for that), you can create a similar data table with random values to evaluate the code. I can then adapt it to my own data.

Select R values ggplot2 • 936 views
ADD COMMENT
0
Entering edit mode

Your geom_point line should be the following to color the values where SM > 2 and SPL < 1.

geom_point(size=2, shape=16, alpha = 1, aes(color = SM > 2 & SPL < 1))

If you want the dots that satisfy the above condition to be red and the rest black you can modify the color palette manually.

LC <- LC + scale_color_manual(values=c("black", "red"))

To label the top SM values where SM > 2 and SPL < 1, first find the genes where this is true, and then add a text layer using ggrepel .

library("ggrepel")

top_genes <- Change %>%
  filter(SM > 2 & SPL < 1) %>%
  slice_max(SM, n=10) %>%
  pull(GeneName)

LC <- LC + geom_text_repel(aes(label=ifelse(GeneName %in% top_genes, GeneName, "")), show.legend=FALSE)

Your final code should look something like this.

library("tidyverse")
library("ggrepel")

Change <- openxlsx::read.xlsx("/Users/Desktop/TableFCvsFC.xlsx")

top_genes <- Change %>%
  filter(SM > 2 & SPL < 1) %>%
  slice_max(AM, n=10) %>%
  pull(GeneName)

LC <- ggplot(Change, aes(x=SM, y=SPL, family = "sans")) +
  geom_vline(xintercept = 1e+00) + 
  geom_hline(yintercept = 1e+00) + 
  geom_abline(intercept = 0, slope = 1) +
  geom_point(size=2, shape=16, alpha = 1, aes(color = SM > 2 & SPL < 1)) +
  scale_x_continuous(trans = "log10") + 
  scale_y_continuous(trans = "log10") +
  geom_text_repel(aes(label=ifelse(GeneName %in% top_genes, GeneName, "")), show.legend=FALSE) +
  scale_color_manual(values=c("black", "red"))
ADD REPLY
0
Entering edit mode

Thank you so much for your help rpolicastro ! I will try that code! :D

ADD REPLY
0
Entering edit mode

One more question if you have time to answer: in the case I would like to color dots "SPL >2 & SM < 1" in blue at the same time the other dots "SM > 2 & SPL < 1" are red and the rest of the dots are black, how should I proceed? I would also like to label top 10 blue dots in that case.

ADD REPLY

Login before adding your answer.

Traffic: 1405 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6