Entering edit mode
6.3 years ago
Kevin Blighe
88k
NB - please post questions on Bioconductor Support: https://support.bioconductor.org/
[ direct link to create a post: https://support.bioconductor.org/p/new/post/?tag_val=EnhancedVolcano ]
EnhancedVolcano
GitHub https://github.com/kevinblighe/EnhancedVolcano
Bioconductor
- https://bioconductor.org/packages/release/bioc/html/EnhancedVolcano.html (release)
- https://bioconductor.org/packages/devel/bioc/html/EnhancedVolcano.html (devel)
Great tool, thanks for putting it together!
My DEseqDataSet is actually a set of peaks instead of transcripts that all have a unique identifier going out to 6 figures, I was wondering if there was a way to use
selectLab()
to create custom labels such as TF names for factors I know to bind within those peaks or even small representations of their PWMs - though the former would be sufficient for the moment - or does that have to come from an additional metadata column in DEseqDataSet? Thanks!Hey, thanks for the comments. My colleague Myles back in London deserves the credit for the initial idea of putting this together.
selectLab()
will just match up to whatever you have passed to the requiredlab
parameter. So, it can be anything really, but there is no functionality to automatically pull in TF names. That is a good idea, though, and it would show in a nice way which TFs were up- or down-regulated.I mean, this package is only released for a few weeks at this stage, and I'm fairly open as to where it could be developed further. I was hoping to build something as comprehensive as the ComplexHeatmap package.
Hi Kevin,
Thanks again for the amazing tool!
Can I use EnhancedVolcano with plotly? I assume it's built using ggplot2, and I've tried to do:
ggplotly(plot)
where plot is a volcano plot made using EnhancedVolcano.I get the following error message though:
Being able to use these volcano plots with plotly would be super useful! Especially when there are too many DEGs and it really makes labeling messy.
Thanks!
EnhancedVolcano does indeed return a ggplot2 object, on which extra features can be added. It also utilises ggrepel - perhaps that is the missing link? I have not tried with plotly but will make an attempt later to see how to coerce the volcano object to work with plotly.
I believe there are a few tutorials around where plotly is used to generate a volcano, though. I think that Stephen Turner had one, but cannot find it right now.
Aha! Thank you so much, it's working now :)
Please use
Add Comment
orAdd Reply
as appropriate instead ofAdd Answer
.May I ask how you managed to transform the EnhancedVolcano object with ggplotly into a plotly object? I got stuck on the very same error and the hint ggrepel does not ring a bell....
This helps? - Does EnhancedVolcano() supports crosstalk()?
and i already used it....
Good work dude!
Hi Kevin,
Great tool! I've been pretty much using it for all my volcano plots. I wanted to know if there was a way to use EnhancedVolcanos with results from sleuth?
cheers!
You're welcome. Yes, most likely there is a way. What are the columns in the output of Sleuth?
This is the output for sleuth, where b is equivalent to Log2FC
Hey, well, you have at least 2 required columns:
label
)y
)What is missing is the fold-change for
x
! I have never used Sleuth, however, I searched the Web forums and am surprised to see that the Sleuth developers do not output a fold-change for Sleuth's results. You may consider using a different differential expression analysis program.That's unfortunate. Thanks for looking though!
You could likely still use the
b
value, in which case you should change the x-axis label too. You may want to read through this thread on Google Groups: https://groups.google.com/forum/#!topic/kallisto-sleuth-users/kWodd7CQejE (Harold Pimentel is Sleuth developer)Hi Kevin, Thanks for this great package. A basic question: you specify in the vignette that the default p-value cut-off is 0.05, but from the default plot, it looks to me as if it were 0.005. Is this a misunderstanding on my part? Best wishes, Patrick
Hey Patrick, thanks for noticing that. The default is actually:
I just need to update the text in the vignette!
Thanks, Kevin, I've just seen that, having read through the whole vignette: all makes sense!
I am updating that part of the vignette right now, so, the change will come through on the Vignette on GitHub in the next few minutes: https://github.com/kevinblighe/EnhancedVolcano
It will be until Bioc 3.10 before it is changed on the main Bioconductor branch.
Awesome package, Kevin! Is it possible to remove the log2FC cutoff and lines?
Hey, thank you!
To remove all cut-off lines, you just need to do:
If you want to disable the actual cut-off itself (and, thus, the colouring of the points based on the cut-off), then I may recommend the use of
colCustom
? - there is an example in the vignette:Another possibility is to set
FCcutoff
to something crazy like 1 000 000 such that nothing passes it.Is there any chance to change tick intervals for both horizontal and vertical axes? And thanks a lot this is awsome!
Hey, EnhancedVolcano is [thankfully] fully compatible with ggplot2 functionality. So, to modify, e.g., the y-axis, you can do:
Hopefully this helps!
Hi Kevin, EnhancedVolcano package was really helpful. I have two doubts:
when I used the below option legend=c('NS','Log (base 2) fold-change','P value', 'P value & Log (base 2) fold-change') it has printed Log (base 2) but not the Log2 (subscript). May I know how to do it.
when I have modified pointSize = c(ifelse(res$log2FoldChange>2, 8, 1)) as pointSize = c(ifelse(res$log2FoldChange>2 & res$padj>0.05, 8, 1)) it worked. But when I have used pointSize = c(ifelse(res$log2FoldChange>2 & res$log2FoldChange<-2 & res$padj>0.05, 8, 1)) it did not work. May I know how to do it.
Thanks in advance!
I implemented that a few days ago, so, it is only currently in the development version, which you can install via:
Note: you may have to install the devtools package There are now 2 parameters for the legend:
Expressions (equations, super- and sub-script, etc) can only be used in
legendLabels
. This may seem silly to have 2 parameters for legend, but it relates to how ggplot2 (which is the underlying 'engine' behind EnhancedVolcano) utilises legends.---------------------------
---------------------------
I think that you may want to do:
Hi Kevin,
Thank you for the response.
Both of your suggestions worked :)
I guess, there should not be '>' before 8.
Thank you!
Great. Oh, yes, not sure why I put the
>
there.Hi Kevin,
I have two questions:
When I plot with results object (after shrinkage), the number of genes (visible on the plot) passing the cutoff criteria are relatively less when compared to the plot with results object (without shrinkage). The graph looks nice with shrinkage and much more dispersed without shrinkage. May I know how to overcome this issue.
When we use FCcuoff = 1.2, it represents log2FC (x-axis). But I want to set the cutoff for upregulation as log2FC = 0.585 (FC is >1.5) and downregulation as log2FC = -1 (FC < 0.5). Is it possible to give different log2FC cutoff values for up and downregulation?
Thanks in advance and looking forward to hearing from you.
Hey Bhanu,
The first question is more for the DESeq2 developer. However, the idea of
lfcshrink()
is, generally, to produce more realistic fold changesFor the second part, you would have to avail of the
colCustom
parameter and assign the colours before running theEnhancedVolcano()
function. You could then also draw your own custom cutoff fold change lines with the following:To get rid of the main cut-off lines, just set
cutoffLineType = 'blank'
There are many examples in the vignette: https://github.com/kevinblighe/EnhancedVolcano
Hi Kevin,
Thank you for the response. Now I could use the colCustom parameter and do the required by following the examples in the Github. Thank you again!
Hi Kevin,
Thank you so much for the EnhancedVolcano package - it's great! I have a quick question, I need to generate a high-res image of my volcano plot for publication and was wondering how to do that from EnhancedVolcano. I'm an R newbie and would very much appreciate any tips! I apologize if this is a bit off-topic.
Thank you! Sandra
Hey Sandra, I would generate the figure as a pdf with the
pdf()
function. PDFs are vector-based, not pixelated, so, they look 'perfect; when zoomed in and can also be easily manipulated by a journal graphics team. Also make use of thewidth
andheight
arguments that are passed to thepdf()
function.Alternatively, I show how one can generate multiple volcanos side-by-side in the vignette via grid and gridExtra, e.g., HERE.
Hi Kevin, Thank you so much for the quick reply. The pdf looks beautiful! I have another question, I'm trying to add connectors (to fit more gene name labels)
but it gives me an error message:
Not sure what I'm doing wrong. Works great otherwise. Thanks again!
Great! Oh, which version are you using? - I changed the name of that argument. In your version, it may be
DrawConnectors
.You can install the most up to date version of the package with:
aha! thank you so much, it's working now :)
Hi Kevin,
I frequently face the following error:
Can you please suggest on how to rectify it?
Hey, which code are you running, exactly?
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.Thank you Ram :). I will take it as a learning experience.
Whoa! I think that I saw the problem instantly when I re-ran using my own dataset. Here is what I got:
Your label sizes are way too large. Here it is with smaller label sizes:
Hi Kevin, Thank you! It is working now. Regards, Bhanu
Hi,
Does anyone know how to colour points that have FC>2.5 AND are above the p-value cut-off in 1 colour, and at the same time use a different colour for genes with FC<2.5 AND are above the p-value cut-off? I can see how to do the former from the instructions above, and colour by FC, but not both FC and p-value. I tried using keyvals for this but keep getting error messages.
Also, my labelling only uses row numbers rather than the rowname. I have gene symbols rather than ENS numbers - could this be the problem?
Thanks in advance! Rebecca
Hello, the way to do this is via
keyvals
. Can you paste a sample of your data, and also the error messages that you have been receiving?The 'ENS' IDs are Ensembl gene IDs. The IDs that you use can be anything, but I have not tested EnhancedVolcano in situations where there are no gene names at all.
Hi, thanks for such a pretty and clear volcano plot! I am using it in my project write up.
I am using labelled volcano as my major one and I want smaller volcanos just to see trends in different conditions without labels, but I want to make the plot look consistent.
I was wondering if I can remove the labels of genes? I tried to remove 'lab=rownames(mydata)' in R but it keeps giving me errors. I tried putting it as <na> but it also did not work.
I'm sorry if it is a really basic question, I am a complete beginner in R coding:'(
Thank you.
Hey, sure thing and no problem, you just need to use:
Hi, I would like to have all point log2FC >= 1 and p value 0.05 in one color and all log2FC <= -1 and p value 0.05 in another one. I tried to run the code above but it gives me this error. I am able to run a classic enhanced plot with the green/blue/red color. if I remove the labels part, it gives me this error:
if I include the "labels" part :
I wanted to get the same volcano plot as previously shown but I cannot get it.
Hey, sorry, I am only seeing this now. What is the output of:
?
By the way, you should not have to set the value of
legend
, depending on which version you are using. Which version is it, do you know?Great tool! I am trying "Custom shape & colour over-ride" , . i have 4 cell types . can you tell me how to label genes names of different shapes ?
Hey, did you check the vignette for how you could do this? - https://github.com/kevinblighe/EnhancedVolcano
i had given
selectLab
, but it's not labelling the gene names. and also when i am switching the x axis with y axis it is giving warning -Sure, but, what is the output of:
Also, whatever is the output should be a subset of
res5$symbol
.Regarding the warning message, it may imply that one of your fold change values is infinite or NA, and indicates that one gene / variable was removed.
Hi Kevin
Awesome tool! Thanks so much.
I have run the sample data (airway) and could reproduce the results. However, I am having trouble applying it to my data. I have Lipid (instead of gene) in the first column and I am trying to plot the other two columns log2FC vs. pvalue_neglog. I have replaced lab = rownames(res1) from the example data to res$Lipid (these return the lipid names or labels for the data points on the volcano plot. I am getting a warning message and an empty plot. For your reference, I am also pasting the values in the three required columns (lipid names, log2FC and pvalue_neglog). I would really appreciate if you could please let me know what could be going wrong here. Being a biologist, I'm fairly new to R!
Cheers Farheen
Thanks! - the problem is that your y-axis values are already negative log [base10]. EnhancedVolcano does this conversion internally for you.
You can get around this by converting your y-values back to p-values:
Then:
Thanks for your reply kevin. Yes that seemed to have fixed the issue. Also, if I may ask, could you please let me know how we set a cutoff for logFC and p value? The default ones do not seem to be applying directly to my data.
Non c'รจ problema. These cut-offs are set via
pCutoff
andFCcutoff
. There are no standards in terms of values to choose for these.Hi Kevin
I have been able to plot the data as a volcano plot. However, I have some data points which share the exact same adjusted p value and a similar FC. And so the labels are probably getting overlapped and one label is obscuring the one closest to it. Please let me know if I can change the coordinates of the annotations/labels or is there a way to spread them out so all the labels are visible.
Cheers Farheen
Kevin Blighe in your vignette and here, you typically show examples with quite large FC cutoffs, and the default FC cutoff value in EnhancedVolcano is normally way more than one should consider as a starting point for a gene being biologically DE. What is usually indicated as an FC cutoff for a likely biologically DE gene is something like FC > 1.1 or 1.2 (i.e. 10% or 20% change).
Is your choice of very large FC cutoffs in your vignette examples and function default setting just arbitrary?
The default value is 1, which, if log [base 2] fold changes are supplied, represents a cut-off of absolute log [base 2] fold change of 1. The cut-off value here is again tied to whichever fold changes are supplied, be they linear, log [base 2], or something else.
For the values specified in the vignette, I undoubtedly chose these based on the dataset whose data I am analysing in the vignette, i.e., the data contained in the airway package.
Ok, I see, you aren't assuming people are supplying log FC and could be linear (even though DESeq2, edgeR, and limma each give results in log2FC)
True, but I note that people are using EnhancedVolcano for a wide diversity of studies, including metabolomics, microbiome, proteomics, etc., and that fold changes are not always on the log [base 2] scale in these.
Also, why do the vignette examples and function default
pCutoff
assume using raw p-values? Shouldn't adjusted p-values be used instead, and defaultpCutoff
be 0.05 or similar?No, the typical way to generate volcano plots is based on nominal / un-adjusted / 'raw' p-values. However, the
pCutoff
parameter will represent whatever p-values are supplied, be they adjusted or nominal / un-adjusted / 'raw'.But then the appropriate cutoff always changes depending on the dataset because you haven't done multiple testing correction. And you have to look in your results each time for the appropriate raw p-value cutoff that represents significant DE (by basically comparing it to the padj column to find the right cutoff)
What's the reasoning behind volcano plots based on raw p-values? Seems easier, more consistent for people to understand why the cutoff was chosen (<0.05 for example), and appropriate to use adjusted p-values.
Yes, the cut-off needs to vary for each comparison that is being graphically represented by the volcano.
Regarding raw p-values, it relates to how, after p-value adjustment, many p-values may end up with constant values, such as having, for example, many genes set to 0.03, which makes the volcano look unusual.
Hi Kevin Blighe - is there a way to color a specific set points a different color irrespective of where they are on the graph while keeping the other points with their default colors?
I first tried testing
colCustom
if I could recreate the default green, red, blue, grey default colors using nestedifelse
statements with the log2FC and p-value thresholds, but it's weird there are some angular point coloring artifacts near the threshold dotted lines, it doesn't look like the same plot withoutcolCustom
. If that worked I was going to usecolCustom
to do the same thing except in theifelse
statements to use the default colors except for the points of interest.