Help with complex Volcano Plot
1
0
Entering edit mode
5 months ago
jmannhei ▴ 10

I have three datasets of DEGs from different experiments. Previously, I had plotted each dataset on its own using the R package enhanced volcano. However my boss now wants me to combine all three datasets into one volcano plot similar to the one below enter image description here

The code was not supplied in the paper and I do not know of any functionality to do this in enhanced volcano so my assumption is I would have to do this using ggplot2 but I have been having trouble even plotting one dataset with the numbers. My would be to combine all datasets into a single data frame, with columns for log2fc, -log10(adjp), a number corresponding to the genes of interest, and dataset as follows

data1<-read.csv('dataset1.csv',sep=',',row.index=1)
data2<-read.csv('dataset2.csv',sep=',',row.index=1)
data3<-read.csv('dataset3.csv',sep=',',row.index=1)
genes=c('STAT1','EGFR','PTEN' ...) # genes of interest

# going to add rows to number genes of interest while leaving all others blank
gene_label<-rep(NA,nrow(data1))
gene_num<-1:length(genes)
for (j in seq_along(genes))){
if (genes[j] %in% rownames(data1)){
      gene_label[which,rownames(data1)==genes[j])]<-gene_num[j]}}

# Enter new row for dataset 
dataset=rep(1,nrow(data1))

# create new dataset 
DF1<-cbind(data1,gene_label,dataset)
# Will probably have to change row names because data frames don't allow duplicate row             #names
rows1<-row.names(DF1)
rows1<-paste0(rows1,'_A')
row.names(DF1)<-rows1

# Repeat steps for other datasets ....

# finally combine datasets
DFplot<-rbind(DF1,DF2,DF3)

Now is where I do not know where to go. I do not need different point shapes as the figure does but need them to be different colors was well as the numbers for the genes of interest. I can put the table in after the fact using post processing but if there is a way to do it in the figure that would be great if anybody knows how. Thanks

ggplot2 volcano DEGs • 427 views
ADD COMMENT
1
Entering edit mode
5 months ago

Here's a bit of an example: let me know if there's anything you don't understand

## this part sets up the dummy data so you don't need to run it 
## but it will show you what format your own data should be for ggplot
## /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
## download example data
dataset1.df <- read.delim("https://raw.githubusercontent.com/sdgamboa/misc_datasets/master/L0_vs_L20.tsv")[,1:3]
## make the data smaller so it's easier to show an example - don't do this with your data
set.seed(1234)
dataset1.df <- dataset1.df[sample(x = 1:nrow(dataset1.df), size = 100, replace = FALSE), ]
## you want three datasets plotted so we'll just make 2 more copies and jumble up the data
dataset3.df <- dataset2.df <- dataset1.df
dataset2.df$logFC <- jitter(dataset2.df$logFC, amount = 2)
dataset2.df$PValue <- 10^-jitter(-log10(dataset2.df$PValue), amount = 1)
dataset3.df$logFC <- jitter(dataset3.df$logFC, amount = 2)
dataset3.df$PValue <- 10^-jitter(-log10(dataset3.df$PValue), amount = 1)
## add a dataset label
dataset1.df$dataset <- "Set1"
dataset2.df$dataset <- "Set2"
dataset3.df$dataset <- "Set3"
## combine the datasets
combo.dataset.df <- rbind(dataset1.df, dataset2.df, dataset3.df)
## /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

library(ggplot2)
library(ggrepel)

group.colours <- c("Set1" = "orange", "Set2" = "orchid", "Set3" = "dodgerblue")

## add some random labels, replace this with whatever your labels are properly
combo.dataset.df$label <- sample(1:15, nrow(combo.dataset.df), replace = TRUE)

## make the ggplot and add the labels
g <- ggplot(combo.dataset.df,
       mapping = aes(x = logFC, y = -log10(PValue), colour = dataset)) +
  geom_point() +
  geom_hline(yintercept = 5, linetype = "dotted") + ## set this for your cutoff of choice
  theme_bw() +
  theme(panel.grid = element_blank(),
        aspect.ratio = 1) + ## makes the plot square, above 1 makes it taller and more narrow, below 1 makes it fatter and shorter
  scale_x_continuous(limits = c(-10,10)) + ## set this for your data
  scale_colour_manual(values = group.colours) +
  labs(x = "Log2 Fold change", y = "-log10(p-value)", colour = "Dataset")

g + geom_text_repel(data = subset(combo.dataset.df, -log10(PValue) > 5), ## set the cutoff for your data
                    mapping = aes(label = label), 
                    show.legend = FALSE)  

example

ADD COMMENT
0
Entering edit mode

Thanks so Much!

ADD REPLY

Login before adding your answer.

Traffic: 2796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6