I have a data frame with the differentially expressed genes from EdgeR, Now I am trying to make a volcano plot of it but I want to see only selected genes that are of interest to me to be labelled on the volcano plot. My data frame looks like this
head(results)
Gene Fold pvalue FDR sig
1 ADORA2A 10.273854 1.164636e-28 2.234471e-24 FDR<0.05
2 IL23A 8.132293 4.554177e-28 4.368822e-24 FDR<0.05
3 IL1A 8.430078 6.768343e-27 4.328581e-23 FDR<0.05
4 CXCL6 7.102900 3.299464e-23 1.582588e-19 FDR<0.05
5 CCR7 8.950486 9.111421e-23 3.496235e-19 FDR<0.05
6 IL18R1 6.759646 7.283440e-22 2.329001e-18 FDR<0.05
I tried the below code to generate the volcano plot, which is generated successfully and the selected genes are marked too but not with their exact name but with some random number.
results$genelabels <- ""
results$genelabels <- ifelse(results$Gene == "IL23A"
| results$Gene == "IL1A"
|results$Gene == "IL6"
|results$Gene == "CD80"
|results$Gene == "CD86"
|results$Gene == "NFKB"
|results$Gene == "BAFT2", TRUE,FALSE)
ggplot(results) + geom_point(aes(Fold, -log10(FDR),col=sig))+ geom_text_repel(aes(Fold, -log10(FDR)),label = ifelse(results$genelabels == TRUE, results$Gene,""), box.padding = unit(0.45, "lines"),hjust=1) + theme(legend.title=element_blank(),text = element_text(size=20))+ scale_color_manual(values=c("red", "black"))
Can somebody help me out with what is wrong with the code that is does not show the exact name of the gene on volcano plot?
Thank you so much :)