Hello
I have proportion of samples altered for a list of genes and related p-value like
CNV - log10_pvalue Percentage_altered
CDKN2B Deletion 3 69
CDKN2A Deletion 3 69
RPL22 Deletion 0.087568 33
GATA6 Amplifiction 2.974694135 44
EGFR Amplifiction 1.958607315 42
CCND1 Amplifiction 2.999132278 36
CDK6 Amplifiction 2.795880017 30
GATAD1 Amplifiction 2.795880017 30
KRAS Amplifiction 2.999132278 22
MYB Amplifiction 1.677780705 16
GATA4 Amplifiction 1.091514981 13
MYC Amplifiction 2.22184875 52
CCNE1 Amplifiction -0.000434077 0
TSHZ3 Amplifiction -0.000434077 0
ERBB2 Amplifiction -0.000434077 0
I want to visualise this data like below but I don't know how
Any help please?
It seems to me that is a mix of inverted volcano plot and bubble plot. Two links that can help you to achieve the below as per my experience are below:
1.https://www.r-graph-gallery.com/320-the-basis-of-bubble-plot.html
2.https://www.bioconductor.org/packages/release/bioc/vignettes/EnhancedVolcano/inst/doc/EnhancedVolcano.html
You would definitely need to tweak the code. Is there a GitHub link present from the paper you are referring to? Maybe also digging into that might give some leads.
You are currently missing the variable they used in their y-axis.
Does not seem like so. The Y-axis here refers to the frequency of gain and deletion %, which in the OP query is the last column (Percentage_altered) if I understand correctly.
edit: I think the y axis and the point size are the same variable, but on the y-axis they functionally make the percentage negative for deletion and positive for gain.