Question

Volcano plot

0

Entering edit mode

3.7 years ago

sarahawan92 ▴ 10

I have made the volcano plot for the significant differentially expressed genes. Up regulated genes are 493 and downregulated genes are 298 having P-Value 0.05 and log fold change is 2.5. kindly give me feedback on the volcano plot as I does not looks like the plots which I see most in the papers. I am also writing the commands which I am using for the volcano plots.

Command;

ggplot( data, aes(x=log2FoldChange, y= -log10(Pvalue),col=diffexpressed)) + geom_point()+theme_minimal(base_size = 12, base_rect_size = 5)+ 
  geom_vline(xintercept=c(-2.5, 2.5), col="black", linetype="dashed") +  
  geom_hline(yintercept=-log10(0.05), col="blue", linetype= "dashed")+ scale_color_manual(values=c("lightblue", "black", "dark red")

volcano plot

Hi • 5.6k views

ADD COMMENT • link updated 3.7 years ago by LauferVA 4.8k • written 3.7 years ago by sarahawan92 ▴ 10

0

Entering edit mode

It seems you have removed the genes with lower of 2.5 and upper from -2.5 log2FoldChange from data object.

ADD REPLY • link 3.7 years ago by MiladAD ▴ 10

0

Entering edit mode

If the genes were removed, then there would be no black line at the base. Actually, that is an option you can do to make a cleaner plot. You can remove those genes if you want.

ADD REPLY • link 3.7 years ago by PR ▴ 50

0

Entering edit mode

Thanks for your feedback. yes I am also confused in this flat black line at the bottom. Any suggestions.

ADD REPLY • link 3.7 years ago by sarahawan92 ▴ 10

0

Entering edit mode

Well, I think that black line is just a whole lot of points that have a p-value of 1. Genes that are not significant. If you convert that to -Log10(pvalue), you get zero. When there are so many such points, and they span a range of log2FC values between approx. -2.5 to 2.5, they make the black line.

ADD REPLY • link 3.7 years ago by PR ▴ 50

0

Entering edit mode

Thanks. Just want to double check that is this plot is fine in this way?

ADD REPLY • link 3.7 years ago by sarahawan92 ▴ 10

score 1 · Answer 1 · 2022-01-08

There is good reason to suspect there is a problematic assumption, supposition, error, etc. in your code. While it is very possible that it reflects a valid practice, I would not publish the data until I could explain why this was occurring for myself.

Claim: It is more likely that such a sharp demarcation is the result of a questionable data quality control or association testing procedure than biological reality.

Justification: Your data are telling you that p = 1 is reserved for only genes having -3 < LFC < 3. However, p-value is a statistic that combines several characteristics (mean and variance of each obs, how many obs there are in each state, etc.). While some of these same elements are used to estimate LFC, others are not. As such, such a sharp demarcation tends not to be observed.

If it were me I would look at these places and scan for assignments that could produce a pattern like this. I'd check ...

The segment of code that handles correction for multiple testing
The segment of code after association testing, but before plotting (for example once you have the DE results, but before you have filtered results into genes you want to plot, and genes you don't want to plot).
The segment of code before association testing, that creates rules for which genes to include, and which to remove.

Once I understood why the pattern is observed, I might or might not be comfortable publishing the figure. At minimum, though, this is also worth doing in case one of your reviewers has the same question...

score 0 · Answer 2 · 2022-01-07

Not sure what feedback you are asking for. But generally, the volcano plot looks as expected to me. You have a bunch of genes with p-value 1 that form the line at Y-axis zero value, which is okay. In terms of general interpretation, you are looking for genes that have a low p-value (or higher up on the Y-axis) and a good log fold change (further right or left, away from the center).

score 0 · Answer 3 · 2022-01-07

0

Entering edit mode

3.7 years ago

cpad0112 21k

Change the legend as per scientific notations
Change the point colors. Light blue is not visually legible. Change it to either standard Red and green or adopt a color blind friendly palette
Use expression for making log bases subscript
There seems to be some cut-off at the bottom. For full/classical volcano plot, plot every point
Highlight top 10 or some reasonable number of genes on either side (using ggrepel)
If point 5 is not possible, highlight genes of your interest.
Keep dashed lines with increased thickness. May be you can use grey lines
You can remove "NO" from legend as you are not highlighting them
Use light color for genes that didn't change for increasing the color contrast between unchanged and changed genes.
Change the axis text. Suggested ones: Fold change (log2 - 2 in subscript), P-value (log10- 10 in subscript).
Instead of p-value, see if you can use adjusted p values.

ADD COMMENT • link 3.7 years ago by cpad0112 21k

0

Entering edit mode

Thanks. your feedback is very informative. Kindly if you clear the point "4" as genes which are in black colour at the bottom looks flat which is not happen in classical volcano plot. I need your suggestions as I am writing manuscript. Thanks

ADD REPLY • link 3.7 years ago by sarahawan92 ▴ 10

0

Entering edit mode

See my answer above.

ADD REPLY • link 3.7 years ago by PR ▴ 50

0

Entering edit mode

if you are writing for a manuscript, use journal specific color palette in ggplot. Example libraries are ggpubr (https://github.com/kassambara/ggpubr), ggsci (https://github.com/nanxstats/ggsci)