Producing such a plot in r
1
0
Entering edit mode
4.1 years ago
zizigolu ★ 4.3k

Hello

I have proportion of samples altered for a list of genes and related p-value like

CNV   - log10_pvalue    Percentage_altered

CDKN2B  Deletion    3   69
CDKN2A  Deletion    3   69
RPL22   Deletion    0.087568    33
GATA6   Amplifiction    2.974694135 44
EGFR    Amplifiction    1.958607315 42
CCND1   Amplifiction    2.999132278 36
CDK6    Amplifiction    2.795880017 30
GATAD1  Amplifiction    2.795880017 30
KRAS    Amplifiction    2.999132278 22
MYB Amplifiction    1.677780705 16
GATA4   Amplifiction    1.091514981 13
MYC Amplifiction    2.22184875  52
CCNE1   Amplifiction    -0.000434077    0
TSHZ3   Amplifiction    -0.000434077    0
ERBB2   Amplifiction    -0.000434077    0

I want to visualise this data like below but I don't know how

enter image description here

Any help please?

r • 1.1k views
ADD COMMENT
0
Entering edit mode

It seems to me that is a mix of inverted volcano plot and bubble plot. Two links that can help you to achieve the below as per my experience are below:

1.https://www.r-graph-gallery.com/320-the-basis-of-bubble-plot.html

2.https://www.bioconductor.org/packages/release/bioc/vignettes/EnhancedVolcano/inst/doc/EnhancedVolcano.html

You would definitely need to tweak the code. Is there a GitHub link present from the paper you are referring to? Maybe also digging into that might give some leads.

ADD REPLY
0
Entering edit mode

You are currently missing the variable they used in their y-axis.

ADD REPLY
0
Entering edit mode

Does not seem like so. The Y-axis here refers to the frequency of gain and deletion %, which in the OP query is the last column (Percentage_altered) if I understand correctly.

ADD REPLY
0
Entering edit mode

edit: I think the y axis and the point size are the same variable, but on the y-axis they functionally make the percentage negative for deletion and positive for gain.

ADD REPLY
3
Entering edit mode
4.1 years ago

The example data.

df <- structure(list(gene = c("CDKN2B", "CDKN2A", "RPL22", "GATA6", 
"EGFR", "CCND1", "CDK6", "GATAD1", "KRAS", "MYB", "GATA4", "MYC", 
"CCNE1", "TSHZ3", "ERBB2"), CNV = c("Deletion", "Deletion", "Deletion", 
"Amplifiction", "Amplifiction", "Amplifiction", "Amplifiction", 
"Amplifiction", "Amplifiction", "Amplifiction", "Amplifiction", 
"Amplifiction", "Amplifiction", "Amplifiction", "Amplifiction"
), log10_pvalue = c(3, 3, 0.087568, 2.974694135, 1.958607315, 
2.999132278, 2.795880017, 2.795880017, 2.999132278, 1.677780705, 
1.091514981, 2.22184875, -0.000434077, -0.000434077, -0.000434077
), Percentage_altered = c(69L, 69L, 33L, 44L, 42L, 36L, 30L, 
30L, 22L, 16L, 13L, 52L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-15L))

ggplot2 answer

library("tidyverse")
library("ggrepel")

df %>%
  mutate(net_frequency=ifelse(CNV == "Deletion", -Percentage_altered/100, Percentage_altered/100)) %>%
  ggplot(aes(x=log10_pvalue, y=net_frequency)) +
    geom_point(aes(size=Percentage_altered, color=log10_pvalue)) +
    geom_text_repel(aes(label=ifelse(log10_pvalue > -log10(0.05), gene, "")), force=10) +
    geom_hline(yintercept=0, lty=2) +
    theme_classic()

enter image description here

ADD COMMENT
0
Entering edit mode

Thank you so much

How I can put gene name on the corresponding bubble please?

ADD REPLY
0
Entering edit mode

I edited the post to include the gene names for genes with a p-value < 0.05.

ADD REPLY
0
Entering edit mode

Sorry this is my full data

gene    CNV -log10_pvalue   Percentage_altered
CDKN2B  Deletion    2.72E+01    69
CDKN2A  Deletion    2.72E+01    69
RPL22   Deletion    1.057654569 36
GATA6   Amplification   4.22184875  42
EGFR    Amplification   2   34
CCND1   Amplification   5.698970004 32
CDK6    Amplification   3.22184875  24
GATAD1  Amplification   3.22184875  24
KRAS    Amplification   5.698970004 24
MYB Amplification   1.698970004 16
GATA4   Amplification   1.096910013 16
MYC Amplification   2.22184875  52
CCNE1   Amplification   0   0
TSHZ3   Amplification   0   0
ERBB2   Amplification   0   0

CCNE1, TSHZ3 and ERBB2 are all zero percent therefore I don't have any p-value for them so I put log10(1)=0 so on the plot I must see three bubbles on the 0 axis but I see only one bubble, please correct me if I am wrong here

I want to show gene cable for all if possible

enter image description here

ADD REPLY
1
Entering edit mode

If their p-value and percentage are the same the points will be exactly on top of each other.

ADD REPLY

Login before adding your answer.

Traffic: 2674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6