Label selected genes in volcano plot from ggplot
3
2
Entering edit mode
5.5 years ago

I have a data frame with the differentially expressed genes from EdgeR, Now I am trying to make a volcano plot of it but I want to see only selected genes that are of interest to me to be labelled on the volcano plot. My data frame looks like this

head(results)

   Gene      Fold       pvalue          FDR      sig
1 ADORA2A 10.273854 1.164636e-28 2.234471e-24 FDR<0.05
2   IL23A  8.132293 4.554177e-28 4.368822e-24 FDR<0.05
3    IL1A  8.430078 6.768343e-27 4.328581e-23 FDR<0.05
4   CXCL6  7.102900 3.299464e-23 1.582588e-19 FDR<0.05
5    CCR7  8.950486 9.111421e-23 3.496235e-19 FDR<0.05
6  IL18R1  6.759646 7.283440e-22 2.329001e-18 FDR<0.05

I tried the below code to generate the volcano plot, which is generated successfully and the selected genes are marked too but not with their exact name but with some random number.

results$genelabels <- ""
results$genelabels <- ifelse(results$Gene == "IL23A" 
                             | results$Gene == "IL1A"
                             |results$Gene == "IL6"
                             |results$Gene == "CD80"
                             |results$Gene == "CD86"
                             |results$Gene == "NFKB"
                             |results$Gene == "BAFT2", TRUE,FALSE)
 ggplot(results) + geom_point(aes(Fold, -log10(FDR),col=sig))+ geom_text_repel(aes(Fold, -log10(FDR)),label = ifelse(results$genelabels == TRUE, results$Gene,""), box.padding = unit(0.45, "lines"),hjust=1) + theme(legend.title=element_blank(),text = element_text(size=20))+ scale_color_manual(values=c("red", "black"))

Can somebody help me out with what is wrong with the code that is does not show the exact name of the gene on volcano plot?

R ggplot2 volcano plot • 19k views
ADD COMMENT
7
Entering edit mode
5.5 years ago

Your labels are of class factor and not character. If you wrap it with as.character in your plotting call it should work.

ggplot(results) + geom_point(aes(Fold, -log10(FDR),col=sig))+ geom_text_repel(aes(Fold, -log10(FDR)),label = ifelse(results$genelabels == TRUE, as.character(results$Gene),""), box.padding = unit(0.45, "lines"),hjust=1) + theme(legend.title=element_blank(),text = element_text(size=20))+ scale_color_manual(values=c("red", "black"))

Specifically change the following:

 ifelse(results$genelabels, results$Gene,"")

to

 ifelse(results$genelabels, as.character(results$Gene),"")
ADD COMMENT
2
Entering edit mode
5.5 years ago
AK ★ 2.2k

Hi saamar.rajput,

See the answer from benformatics (credits goes to him), the key is to make sure your "Gene" is in character type instead of factor.

> results <- read.delim("example.txt", header = TRUE, stringsAsFactors = FALSE)
> head(results)
     Gene      Fold       pvalue          FDR      sig
1 ADORA2A 10.273854 1.164636e-28 2.234471e-24 FDR<0.05
2   IL23A  8.132293 4.554177e-28 4.368822e-24 FDR<0.05
3    IL1A  8.430078 6.768343e-27 4.328581e-23 FDR<0.05
4   CXCL6  7.102900 3.299464e-23 1.582588e-19 FDR<0.05
5    CCR7  8.950486 9.111421e-23 3.496235e-19 FDR<0.05
6  IL18R1  6.759646 7.283440e-22 2.329001e-18 FDR<0.05
> str(results)
'data.frame':   6 obs. of  5 variables:
 $ Gene  : chr  "ADORA2A" "IL23A" "IL1A" "CXCL6" ...
 $ Fold  : num  10.27 8.13 8.43 7.1 8.95 ...
 $ pvalue: num  1.16e-28 4.55e-28 6.77e-27 3.30e-23 9.11e-23 ...
 $ FDR   : num  2.23e-24 4.37e-24 4.33e-23 1.58e-19 3.50e-19 ...
 $ sig   : chr  "FDR<0.05" "FDR<0.05" "FDR<0.05" "FDR<0.05" ...

> results$genelabels <- ""
> results$genelabels <- ifelse(results$Gene == "IL23A" 
+                              | results$Gene == "IL1A"
+                              | results$Gene == "IL6"
+                              | results$Gene == "CD80"
+                              | results$Gene == "CD86"
+                              | results$Gene == "NFKB"
+                              | results$Gene == "BAFT2", TRUE, FALSE)

> ggplot(results) +
+   geom_point(aes(Fold, -log10(FDR), col = sig)) +
+   geom_text_repel(
+     aes(Fold, -log10(FDR)),
+     label = ifelse(results$genelabels, results$Gene, ""),
+     box.padding = unit(0.45, "lines"),
+     hjust = 1
+   ) +
+   theme(legend.title = element_blank(), text = element_text(size = 20)) +
+   scale_color_manual(values = c("red", "black"))

volcano

ADD COMMENT
0
Entering edit mode

Thank you so much :)

ADD REPLY
2
Entering edit mode
5.5 years ago
zx8754 12k

As other answers pointed out you have Gene column as factor, and it is getting converted to integer within ggplot ifelse. Instead some solutions:

  1. Read the file with stringsAsFactors = FALSE (see @SMK's answer)
  2. Convert the column to character, results$Gene <- as.character(results$Gene) (see @benformatics answer)
  3. If we wish to keep as factor, then redefine levels, see below:

# redefine levels:
results$genelabels <- factor(results$Gene, levels = c("IL23A", "IL1A","IL6", "CD80","CD86","NFKB","BAFT2"))

# or convert to character:
# results$genelabels <- ifelse(results$Gene %in% c("IL23A", "IL1A","IL6", "CD80","CD86","NFKB","BAFT2"), 
#                              as.character(results$Gene), NA)

ggplot(results, aes(Fold, -log10(FDR), label = genelabels, col = sig)) + 
  geom_point() +
  geom_text_repel(col = "black", na.rm = TRUE, box.padding = unit(0.45, "lines"), hjust = 1) + 
  scale_color_manual(values = c("red", "black")) +
  theme(legend.title = element_blank(), text = element_text(size = 20))

Other notes:

  • avoid using $ within ggplot
  • use %in% instead of chain of OR conditions |
  • comparison already returns logical value, no need for ifelse, e.g: ifelse(x == 1, TRUE, FALSE) same as x == 1
  • no need to "initiate" a new column: results$genelabels <- ""
  • readability: use spaces
  • readability: use line-breaks in ggplot for each layer after +.
ADD COMMENT
0
Entering edit mode

Nice tips! 👍 Thanks, zx8754.

ADD REPLY
0
Entering edit mode

Thank you very much, I will keep in mind the things you pointed out :)

ADD REPLY

Login before adding your answer.

Traffic: 2516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6