Hi all, I have a quick practical and conceptual question. What do you do about visual outliers in your volcano plots?
I have 4 sets of not-so-pretty differential gene expression data that I would like to present as 4 publication-ready volcano plots. Most of my data is close to the origin, but there are 1-3 points per plot that cross the significance threshold but are far outside the center of mass of the plot. I have not conducted a formal outlier analysis on these points, so for now I've been calling them "visual outliers". Default plotting of all the points in my data set results in a zoomed out plot that doesnt allow the reader to appreciate the center of mass of the scatter.
My question is what do you typically do about this?
Is there a standard way to treat these points? I have to imagine, outright removal of data is fraudulent, so cropping without mention is probably not the right choice.
Is there a package in ggplot that makes publication ready zoom plots, insets, or line breaks, etc. that you have used before with success?
Have you tried using
lfcShrink
if you're using DESeq2? I've found that those visual outliers tend to go away after lfcShrink.unfortunately, I dont have access to the DESeq2 data objects for these data, only the final results table. And I dont think my measly computer has enough ram to run the DESeq2 analysis from the raw counts. I appreciate the input, but at this time I dont know if this will be a viable solution to my current situation.
Is it an outlier because of the log2 fold change (x-axis) or because of the the p-value (y-axis)?
If it's a p-value issue, make sure you plot the adjusted p-value or, if the adjusted p-value is too small, you can just shrink it to an "upper bound" smaller number (it doesn't really matter whether it's 10^-10 or 10^-6 -- you're rejecting the null hypothesis anyway) and mention that in the figure legend.
Also, for DESeq2, you don't have to run it on a computer. Run it on the cloud! You could probably get deseq2 working on the free google colab even!
I don't recall DESeq2 being a RAM intensive application. I've run it on measly standard laptops plenty of times. By the way, these "outliers" (genes with very high DE?) are typically the reason one does the experiment in the first place, no? Something I've seen in the past is to have a broken axis (squiggly lines across the axis indicating a breakpoint), so that you can display two ranges. But it's obviously a custom plot, and you have to make sure the ranges are clear.