Good Morning, I'm sorry to disturb, but as anyone done these volcano plot? I'm only able to do the traditional one, i'm kind knew too these field. Thanks Ana Sofia Moreira
Good Morning, I'm sorry to disturb, but as anyone done these volcano plot? I'm only able to do the traditional one, i'm kind knew too these field. Thanks Ana Sofia Moreira
Edit 28th March, 2019:
this is now a Bioconductor package: EnhancedVolcano: Publication-ready volcano plots with enhanced colouring and labeling
That looks like an up-side down volcano plot, which is unusual. Why not just do a regular one?
You can use my EnhancedVolcano function on my GitHub page (in part developed with a Junior Bioinformatician): https://github.com/kevinblighe (https://github.com/kevinblighe/EnhancedVolcano)
Apart from anything else, it makes the use of ggrepel
in order to fit as many gene labels as possible.
Ah, apologies, it is not on Bioconductor. You can just copy the R script via git clone or just download it, and then load it into your session with:
source("EnhancedVolcano.R")
I have not yet standardised the code such that it would be ready for Bioconductor. Hope that's okay.
Error in abs(toptable$log2FoldChange) : non-numeric argument to mathematical function i m trying to use this but i get this error ..sorry my bad i had some NA in data frame , now removed but i still dont get the plot...finally managed to get the plot "RStudioGD" something with the graphics output after clearing dev.off() many time somehow it worked
Yes, the function works as follows:
The key is the check_overlap=TRUE/FALSE
parameter that is passed to geom_text()
within the function
If you just want to supply a specific list of gene names, like myCustomLabel=c("GeneA", "", "GeneC", "", "", "GeneF", ...), then you should be able to pass this to the function and use it again in the geom_text() function:
geom_text(data=subset(toptable, padj<LabellingCutoff & abs(log2FoldChange)>FCCutoff),
aes(label=myCustomLabel),
size=2.25,
#segment.color="black", #This and the next parameter spread out the labels and join them to their points by a line
#segment.size=0.01,
check_overlap=TRUE,
vjust=1.0) +
Hi Kevin,
Apology for late response. I manage to solve the problem. My file is in csv format so there is an error when import the file using read.table command. So I just reformat the file into tab deliminated format, reran back the "EnhancedVolcano.R" script and manage to generate cool volcano plot. However the gene IDs were not based on ENSEMBLE ID. DId I do something wrong ?
Hey, that's great! The gene names will be whatever you have used, so, they could be ENSEMBL IDs, RefSeq IDs, etc.
Note that I have updated this function and it is currently being submitted to Bioconductor: https://github.com/kevinblighe/EnhancedVolcano
It is now much more flexible. You can install the first version like this:
require(devtools)
install_github("kevinblighe/EnhancedVolcano")
require(EnhancedVolcano)
?EnhancedVolcano
Hi Kevin
I've tested the https://github.com/kevinblighe/EnhancedVolcano. however the volcano plot generated without any point. I'm using the same dataset for previous run. Here are the command I used
results2<-read.table("res.infected_norm-vs-control-noNA.txt", header=TRUE)
head(results2)
EnhancedVolcano(results2,
lab = rownames(results2),
x = "log2FoldChange",
y = "pvalue",
pCutoff = 0.05,
FCcutoff = 2,
transcriptPointSize = 1.5,
transcriptLabSize = 3.0,
title= "infected vs uninfected")
here you are
> head(results2)
baseMean log2FoldChange lfcSE stat pvalue
ENSMUSG00000033860 1168.9376 -27.24585 4.784735 -5.694328 1.24e-08
ENSMUSG00000057400 993.8006 -26.97533 4.784741 -5.637783 1.72e-08
ENSMUSG00000033831 1201.9378 -26.87136 4.784734 -5.616061 1.95e-08
ENSMUSG00000078520 551.5458 -26.32777 4.784771 -5.502410 3.75e-08
ENSMUSG00000059481 686.4623 -26.24442 4.784757 -5.485006 4.13e-08
ENSMUSG00000113555 807.0622 -25.97442 4.457547 -5.827064 5.64e-09
padj
ENSMUSG00000033860 2.19e-07
ENSMUSG00000057400 2.96e-07
ENSMUSG00000033831 3.32e-07
ENSMUSG00000078520 6.09e-07
ENSMUSG00000059481 6.65e-07
ENSMUSG00000113555 1.06e-07
> str(results2)
'data.frame': 33044 obs. of 6 variables:
$ baseMean : num 1169 994 1202 552 686 ...
$ log2FoldChange: num -27.2 -27 -26.9 -26.3 -26.2 ...
$ lfcSE : num 4.78 4.78 4.78 4.78 4.78 ...
$ stat : num -5.69 -5.64 -5.62 -5.5 -5.49 ...
$ pvalue : num 1.24e-08 1.72e-08 1.95e-08 3.75e-08 4.13e-08 5.64e-09 6.14e-08 8.28e-08 8.95e-08 1.03e-07 ...
$ padj : num 2.19e-07 2.96e-07 3.32e-07 6.09e-07 6.65e-07 1.06e-07 9.57e-07 1.26e-06 1.35e-06 1.54e-06 ...
Thanks! Nothing seems out of the ordinary. I was able to plot the data-points you pasted.
Have you got the ggrepel package installed? - it's required, but wasn't required in the 'pre-development' version that I posted on Biostars.
I can see that I need to improve the code to produce helpful error messages. It's still passing through the Bioconductor submission process.
If worse comes to worse, just continue to use the version that actually worked for you. Thanks!
Hi Kevin, This seems like some great package, however I am running into the same "empty plot" error as above. I am pasting the str(data) results of 'data'(i.e. my input file). Could you please let me know,is the file format is wrong? My code for the plot is same as above:
EnhancedVolcano(data,
lab = rownames(data),
x = "log2FoldChange",
y = "qvalue",
pCutoff = 0.05,
FCcutoff = 2,
transcriptPointSize = 1.5,
transcriptLabSize = 3.0,
title= "Treated vs untreated")
Edit: I don't know how to upload image, so copy pasting the str(data) results.( Sorry)
'data.frame': 58676 obs. of 14 variables:
$ GeneId : Factor w/ 58676 levels "TBIG000001","TBIG000002",..: 641 4468 9003 9153 9274 9365 10275 11524 14429 15694 ...
$ GeneAcc : Factor w/ 58676 levels "ENSG00000000003",..: 16655 5426 4353 36091 13318 6834 10698 622 13091 4184 ...
$ NcbiGene : Factor w/ 26853 levels "1","10","100",..: 17517 22283 20484 17872 9266 25141 14238 7687 10038 17173 ...
$ Type : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
$ GeneName : Factor w/ 57131 levels "1-Dec","1-Mar",..: 41663 21692 22354 54930 13393 20863 30970 23741 27077 41135 ...
$ Desc : Factor w/ 45340 levels "","-","1-acylglycerol-3-phosphate O-acyltransferase 1 [Source:HGNC Symbol;Acc:HGNC:324]",..: 26809 5467 2846 41883 914 2306 13008 4060 9260 26723 ...
$ EXP.Control.FPKM : Factor w/ 5482 levels "-","0","0.01",..: 3796 3686 5247 599 3716 2286 1010 4941 5169 452 ...
$ EXP.Case.FPKM : Factor w/ 5667 levels "-","0","0.01",..: 1691 4570 1429 553 4117 3494 3340 265 2342 1686 ...
$ DEG.Control.vs.Case.QV.0.05.SELECT: Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
$ DEG.Control.vs.Case.PV : Factor w/ 6633 levels "-","0.0001","0.0002",..: 2 2 2 2 2 2 2 2 2 2 ...
$ qvalue : num 0.0287 0.0154 0.0154 0.0154 0.0154 0.0154 0.0154 0.0154 0.0154 0.0154 ...
$ log2FoldChange : num -4.54 -2.79 -2.47 3.1 -3.04 -2.63 -2.03 -2.95 -1.83 -2.6 ...
$ DEG.Control.vs.Case.UP_DOWN : Factor w/ 3 levels "-","DOWN","UP": 2 2 2 3 2 2 2 2 2 2 ...
$ GeneOntology : Factor w/ 16446 levels "-","GO:0000002,GO:0000122,GO:0000165,GO:0000790,GO:0000977,GO:0000978,GO:0000981,GO:0001077,GO:0001085,GO:0001205,G"| __truncated__,..: 11435 8571 8697 7835 10594 955 8730 11181 8371 8192 ...
Hey, when you say "a lot" [of your rows are NA], you equally imply that not everything is NA (?). Perhaps, therefore, the problem lies elsewhere outside of EnhancedVolcano()
.
What does the following return:
table( is.na(as.character(data$GeneId)) )
table( is.na(data$GeneId) )
Thank you for your reply,Kevin.
table( is.na(as.character(data$GeneId)) ) returns < table of extent 0 >
table( is.na(data$GeneId) ) returns < table of extent 0 >
I wonder what could the problem outside enhanced volcano be. My input is a csv file with the columns; GeneName, log2FoldChange, qvalue,Control Vs Case UP_Down
column "Control Vs Case UP_Down" just tells me whether a gene is up regulated or downregulated.
table( as.character(data$GeneId) == 'NA' )
also gives me < table of extent 0 >
I was given the csv file with above listed columns, so am not very sure which procedure was used to arrive at the results in csv file. Can we not convert factors into a type that is required by EnhancedVolcano function(i.e. if factors are creating a problem)?
thanks for the quick reply. i am fairly new to this so i had difficulty understanding how things work here. anyway, yes, i already have my DE data, which i already filtered. it consist of my_transcript_name, log2FC, & FDR value. since my data is already filtered, i just wanted to plot it using volcano plot. i started with the basic command from your script because i just want to draw out of something with my data. however, i couldn't arrive with a figure.
results <- read.table("my_data", header=TRUE, sep = '\t')
toptable <- as.data.frame(results)
install.packages("ggplot2")
library(ggplot2)
plot <- ggplot (toptable, aes(x=log2FC,y=FDR))
Oh, this code is now a comprehensive package on Bioconductor: https://github.com/kevinblighe/EnhancedVolcano
In EnhancedVolcano, for the most basic plot, you would do:
library(EnhancedVolcano)
EnhancedVolcano(toptable,
lab = rownames(toptable),
x = 'log2FC',
y = 'FDR')
You may have to install the package first.
Try to install not via install.packages()
if (!requireNamespace('BiocManager', quietly = TRUE))
install.packages('BiocManager')
BiocManager::install('EnhancedVolcano')
In the unlikely event that that does not work, this will work:
devtools::install_github('kevinblighe/EnhancedVolcano')
(you may need to install devtools first)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You can check ggplot2 documentation, ggplot2 tutorial, ggrepel. These are enough to plot what you have asked for.
Another link: Creating Interactive Volcano Plots with R and Plot.ly
One more: Repel overlapping text labels in ggplot2
Could you edit your question by adding some test data and also what you already tried.
Hi Kevin, I am trying to use your code with my data but I am not being able to produce any plot (eventhough I don't get any error messages back). So I wonder wether you could help me to localize my problem. The code I am using is the following
The DEseq2 output has the regular structure, being: Contig baseMean log2FoldChange lfcSE stat pvalue padj NReads FBgn0000003 135701.1304 3.841593895 0.218300714 17.59771566 2.56E-69 3.72E-66 513393 etc.
Thanks in advance,
Jordi Planells
Hi Jordi,
First ensure that all devices are switched off by running
dev.off()
until there are no more devices to close.Then, you should try to run the function like this:
Does that work?
Sorry for late reply I've been out for some days. I am afraid it doesn't work, I have used dev.off() untill I got this message back: Error in dev.off() : cannot shut down device 1 (the null device). Still I don't get any output. I am not able to figure out why
Can you show the exact sequence of commands that you're using? - sorry
Of course!
and the plot commands exactly the same code you posted in github
Thank you so much again