I used this R script of rnaseqdata to generate a Volcano plot:
but there are 3 outliers point deforming the graphic shape. How can I remove it ?
resultado_table=rbind(data.frame(results(dds, contrast=c("condition", "Controle" , "Infectado"))))
resultado_table$Variable[1:39179] <- "Controle_vs_Colombiana"
##Highlight genes that have an absolute fold change > 2 and a p-value < Bonferroni cut-off
resultado_table$threshold = as.factor(abs(resultado_table$log2FoldChange) > 1 & resultado_table$padj < 0.05)
write.table(resultado_table, file="./GENES_DE")
volcano_plot <- ggplot(data=resultado_table, aes(x=log2FoldChange, y=-log10(padj) , colour=threshold)) +
geom_point(alpha=0.4, size=1.75) +
xlab("\n log2 fold change") +
ylab("-log10 p-value adjusted \n") +
theme_bw() +
theme(axis.title.y = element_text(face="bold", size=16),
axis.title.x = element_text(face="bold", size=16),
axis.text = element_text(size=12),
legend.title =element_blank() ,
legend.text = element_text(size = 12)) +
facet_wrap(~ Variable, ncol=3) +
theme(strip.text = element_text(size=12, face="bold"))
volcano_plot
You can just use the
xlim()
andylim()
options to just set the bounds and not have to literally remove the datapoints.Thank you Devon, as always you helping me. But where exactly I use this to actually work?
No problem. I should note there's a third option that's used by DESeq2 and a few other programs. In short, you modify the values that you're plotting such that anything outside of the bounds you want is now exactly on the border. You then denote that by changing the symbol used for these points. An example of that with an MA plot is below, where triangles denote values outside of the bounds.
I should note that this image is originally from Deseq's plotMA color-coding at random?
I'm actually trying to create a volcano plot in ggplot2 that INCLUDES outliers and displays them as triangles at the axis border. How can I code this? Thanks!
Assuming you have everything in a dataframe and the values are the "values" column:
You then use
shape=shape
in the aesthetic.Thanks, it works!
What if I wanted to display the outliers like this for both the x and y axes?
It's the same general idea. You take whatever holds your X axis values and
I picked 50 as an arbitrary value.
Hmm, when I apply the code to both axes, all of my points just appear as a vertical line at x=1.5
Here's my code:
A couple things:
shape
multiple times like that. You need to set just a subset ofshape
the second time:df$shape[abs(df$logFC)>1.5] <- "triangle"
df$logFC[df$logFC>1.5] <- 1.5
anddf$logFC[df$logFC < -1.5] <- -1.5
.xlim()
part.If for some reason you still get a line at x=1.5, then just have a look at
df$logFC
after each of the two trimming steps. It should be apparent then when things are going wrong and why.Got it. The line of points was due to me screwing up the cutoff for replacing values and then forgetting to reset them to original values before trying the code again.
I end up having to set the xlim and ylim to the values that give the best zoom of the the majority of data points. I had 10 of so points that were very distant from the rest hence me finding this thread! Thanks for the help!
Here's my code and the volcano plot it produced for anyone's reference: