What to do about "outlier" values in volcano plot?
1
0
Entering edit mode
16 months ago
ivingan • 0

Hi all, I have a quick practical and conceptual question. What do you do about visual outliers in your volcano plots?

I have 4 sets of not-so-pretty differential gene expression data that I would like to present as 4 publication-ready volcano plots. Most of my data is close to the origin, but there are 1-3 points per plot that cross the significance threshold but are far outside the center of mass of the plot. I have not conducted a formal outlier analysis on these points, so for now I've been calling them "visual outliers". Default plotting of all the points in my data set results in a zoomed out plot that doesnt allow the reader to appreciate the center of mass of the scatter.

My question is what do you typically do about this?

Is there a standard way to treat these points? I have to imagine, outright removal of data is fraudulent, so cropping without mention is probably not the right choice.

Is there a package in ggplot that makes publication ready zoom plots, insets, or line breaks, etc. that you have used before with success?

R ggplot2 transcriptomics • 3.0k views
ADD COMMENT
0
Entering edit mode

Have you tried using lfcShrink if you're using DESeq2? I've found that those visual outliers tend to go away after lfcShrink.

ADD REPLY
0
Entering edit mode

unfortunately, I dont have access to the DESeq2 data objects for these data, only the final results table. And I dont think my measly computer has enough ram to run the DESeq2 analysis from the raw counts. I appreciate the input, but at this time I dont know if this will be a viable solution to my current situation.

ADD REPLY
0
Entering edit mode

Is it an outlier because of the log2 fold change (x-axis) or because of the the p-value (y-axis)?

If it's a p-value issue, make sure you plot the adjusted p-value or, if the adjusted p-value is too small, you can just shrink it to an "upper bound" smaller number (it doesn't really matter whether it's 10^-10 or 10^-6 -- you're rejecting the null hypothesis anyway) and mention that in the figure legend.

Also, for DESeq2, you don't have to run it on a computer. Run it on the cloud! You could probably get deseq2 working on the free google colab even!

ADD REPLY
0
Entering edit mode

I don't recall DESeq2 being a RAM intensive application. I've run it on measly standard laptops plenty of times. By the way, these "outliers" (genes with very high DE?) are typically the reason one does the experiment in the first place, no? Something I've seen in the past is to have a broken axis (squiggly lines across the axis indicating a breakpoint), so that you can display two ranges. But it's obviously a custom plot, and you have to make sure the ranges are clear.

ADD REPLY
2
Entering edit mode
15 months ago

Hi,

You could probably fix the x-axis, representing fold-change or effect size to, say, -4 to +4, and then set any values less than or greater than these to -4 or +4, respectively. You could then also use a different shape for these in the plot, and add a footnote to explain the situation and how it is to improve visualisation and interpretation.

This is actually what the default MA plot function from DESEq2 does:

download

Kevin

ADD COMMENT
0
Entering edit mode

I really like this solution, especially with the use of the open triangles. has the same visual syntax as video game mini map with icons on the bounds of the map indicating that the POI is out of range. I wonder if non-video game players would have an intuitive understanding of that symbolism. Im going to try!

ADD REPLY
0
Entering edit mode

Okay, I trust that it will be a success.

ADD REPLY

Login before adding your answer.

Traffic: 1978 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6