Hello,
I am trying to generate a volcano plot with my RNAseq data. Everything works fine but I am having issue with labelling the selected gene names.
Please see below the code I am using:
res <- read.csv("Treated_vs_Control_volcano.csv")
res_plot <- res[order(res$padj), ]
res_plot$col <- 'gray40'
res_plot$col[res_plot$log2FoldChange > 1 & res_plot$padj < 0.05] <- 'orangered'
res_plot$col[res_plot$log2FoldChange < -1 & res_plot$padj < 0.05] <- 'blue'
genes <- c('A89', 'CT76', 'GY7', 'HG54', 'JHI4', 'UYT43', 'CU6', 'RT4')
par(mar = c(4, 4.4, 5, 10), xpd = TRUE)
plot(res_plot$log2FoldChange, -log10(res_plot$padj), col = res_plot$col, pch = 19, cex =1, xlab = expression(log[2]("FC")), ylab = expression(-log[10]("FDR")), cex.lab=1.2, cex.axis=1.2, cex.main=1.5, cex.sub=1.2, frame.plot = FALSE, box(bty="l"))
This creates a nice volcano plot. After that, I add the following code to label the gene names present in the "genes" list created above.
text(res_plot[res_plot$symbol %in% genes,]$log2FoldChange,-log10(res_plot[res_plot$symbol %in% genes,]$padj), pch = 19, cex = 1, col = 'red')
"symbol" is the first column name of my csv file that contains gene symbols.
However, instead of labelling the gene with its symbol, it just labels 1, 2, 3, 4 etc.
Can someone please help me to fix so I could label the gene names?
Many thanks,
I recommend learning ggplot2. You can add text there with
ggrepel
. It's a bit of a learning curve but makes plots so much nicer and more versatile than base R plotting. These kinds of plots and related data munging should be in the skillset of any analyst without any wrapper packages. That having said, if you want it as a wrapper then people often use https://bioconductor.org/packages/release/bioc/html/EnhancedVolcano.html -- still, I encourage you to learn ggplot2, it's a basic skill in bioinfo.