Volcano plot
3
0
Entering edit mode
2.9 years ago
sarahawan92 ▴ 10

I have made the volcano plot for the significant differentially expressed genes. Up regulated genes are 493 and downregulated genes are 298 having P-Value 0.05 and log fold change is 2.5. kindly give me feedback on the volcano plot as I does not looks like the plots which I see most in the papers. I am also writing the commands which I am using for the volcano plots.

Command;

ggplot( data, aes(x=log2FoldChange, y= -log10(Pvalue),col=diffexpressed)) + geom_point()+theme_minimal(base_size = 12, base_rect_size = 5)+ 
  geom_vline(xintercept=c(-2.5, 2.5), col="black", linetype="dashed") +  
  geom_hline(yintercept=-log10(0.05), col="blue", linetype= "dashed")+ scale_color_manual(values=c("lightblue", "black", "dark red")

volcano plot

Hi • 3.9k views
ADD COMMENT
0
Entering edit mode

It seems you have removed the genes with lower of 2.5 and upper from -2.5 log2FoldChange from data object.

ADD REPLY
0
Entering edit mode

If the genes were removed, then there would be no black line at the base. Actually, that is an option you can do to make a cleaner plot. You can remove those genes if you want.

ADD REPLY
0
Entering edit mode

Thanks for your feedback. yes I am also confused in this flat black line at the bottom. Any suggestions.

ADD REPLY
0
Entering edit mode

Well, I think that black line is just a whole lot of points that have a p-value of 1. Genes that are not significant. If you convert that to -Log10(pvalue), you get zero. When there are so many such points, and they span a range of log2FC values between approx. -2.5 to 2.5, they make the black line.

ADD REPLY
0
Entering edit mode

Thanks. Just want to double check that is this plot is fine in this way?

ADD REPLY
1
Entering edit mode
2.9 years ago
LauferVA 4.5k

There is good reason to suspect there is a problematic assumption, supposition, error, etc. in your code. While it is very possible that it reflects a valid practice, I would not publish the data until I could explain why this was occurring for myself.

Claim: It is more likely that such a sharp demarcation is the result of a questionable data quality control or association testing procedure than biological reality.

Justification: Your data are telling you that p = 1 is reserved for only genes having -3 < LFC < 3. However, p-value is a statistic that combines several characteristics (mean and variance of each obs, how many obs there are in each state, etc.). While some of these same elements are used to estimate LFC, others are not. As such, such a sharp demarcation tends not to be observed.

If it were me I would look at these places and scan for assignments that could produce a pattern like this. I'd check ...

  1. The segment of code that handles correction for multiple testing
  2. The segment of code after association testing, but before plotting (for example once you have the DE results, but before you have filtered results into genes you want to plot, and genes you don't want to plot).
  3. The segment of code before association testing, that creates rules for which genes to include, and which to remove.

Once I understood why the pattern is observed, I might or might not be comfortable publishing the figure. At minimum, though, this is also worth doing in case one of your reviewers has the same question...

ADD COMMENT
0
Entering edit mode

Mods - question about this post. Would it have been better to divide into two separate questions?

ADD REPLY
0
Entering edit mode
2.9 years ago
PR ▴ 50

Not sure what feedback you are asking for. But generally, the volcano plot looks as expected to me. You have a bunch of genes with p-value 1 that form the line at Y-axis zero value, which is okay. In terms of general interpretation, you are looking for genes that have a low p-value (or higher up on the Y-axis) and a good log fold change (further right or left, away from the center).

ADD COMMENT
0
Entering edit mode
2.9 years ago
  1. Change the legend as per scientific notations
  2. Change the point colors. Light blue is not visually legible. Change it to either standard Red and green or adopt a color blind friendly palette
  3. Use expression for making log bases subscript
  4. There seems to be some cut-off at the bottom. For full/classical volcano plot, plot every point
  5. Highlight top 10 or some reasonable number of genes on either side (using ggrepel)
  6. If point 5 is not possible, highlight genes of your interest.
  7. Keep dashed lines with increased thickness. May be you can use grey lines
  8. You can remove "NO" from legend as you are not highlighting them
  9. Use light color for genes that didn't change for increasing the color contrast between unchanged and changed genes.
  10. Change the axis text. Suggested ones: Fold change (log2 - 2 in subscript), P-value (log10- 10 in subscript).
  11. Instead of p-value, see if you can use adjusted p values.
ADD COMMENT
0
Entering edit mode

Thanks. your feedback is very informative. Kindly if you clear the point "4" as genes which are in black colour at the bottom looks flat which is not happen in classical volcano plot. I need your suggestions as I am writing manuscript. Thanks

ADD REPLY
0
Entering edit mode

See my answer above.

ADD REPLY
0
Entering edit mode

if you are writing for a manuscript, use journal specific color palette in ggplot. Example libraries are ggpubr (https://github.com/kassambara/ggpubr), ggsci (https://github.com/nanxstats/ggsci)

ADD REPLY
0
Entering edit mode

Thanks for sharing the information.

ADD REPLY

Login before adding your answer.

Traffic: 2568 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6