Hi folks! I need your help with a ggplot2 representation.
I have a data set that has the following structure:
log2FoldChange Sequence_biotype Knockdown
-1.40 LTR A
-1.11 DNA B
-3.46 Protein A
-1.25 Protein C
1.03 DNA B
... ... ...
I am plotting the foldChanges as boxplots, one for each Sequence_biotype
. I have the knockdown variable faceted.
What I'm trying to do is to do a one sample Wilcoxon test for each boxplot, comparing the log2FoldChange to 0 (to see if there is a significant change). That can be achieved with the following code:
wilcox.test(x = data, mu = 0)
when the data is grouped by Sequence_biotype and Knockdown.
My question is, how could I introduce the results of the wilcox test to the plot as labels or how could I compute it directly in the plot (I have been trying with stat_compare_means(method = "wilcox", paired = FALSE)
from ggpubr package but all the pvalues are piling up to the same spot)
Thank you before hand, any help is appreciated!
Best,
Jordi
Hi,
Can you provide the code that you've tried as well the figure that you got?
António
Sure! This is the code and the plot I currently have
plot_generated
Hi! An expansion of y axis may help to avoid piling up stat_compare_means() results. Alternatively, results of Wilcoxon test may be added on the plot using ggpubr::stat_pvalue_manual(). In such cases, I compute Wilcoxon using ggpubr::compare_means() - maybe because it is directly compatible with ggpubr::stat_pvalue_manual(), I don't recall why)))
I have tried the compare_means approach and it works, I obtain the kind of data that I am looking for. However, I still don't know how to embed the data into the plot, I tried with
stat_pvalue_manual()
but I have the same problem as withstat_compare_means()
, that the values are piled up on top of each other. Any clue why?I guess that the problem arises because stat_compare_means compares all groups against all groups - there are too many values to plot. If you specify ref.group or comparisons, p-values will move to the respective bars. As far as I know, it will require an explicit plotting of compared groups on x-axis.
Yes indeed, if I specify a reference group all the labels move to the right place! The problem is that I don't want any reference group, but to compare it to mu=0... It is a wilcoxon one sample test what I would like to do
There should be one p-value for each facet, right?
One p-value for each biotype (or boxplot) in each facet. In other words, if I have 6 boxplots per facet, I should get 6 p-values per facet
If so, I can suggest making a table with p-values calculated for the corresponding Sequence_biotype and Knockdown type, then add arbitrary y-axis (log2FoldChange) value higher than its actual values to place p-values on top of the bars (say, 5) and finally add this table to the plot with geom_text(data = your_table, aes(label = p_values)). The presence of biotypes, facetting variable and y-axis values will take care of proper p-value positioning. Since so many p-values will overlap, you can 1) rotate p-values or a whole plot 2) incrementally increase y-axis value for each next p-value. For 6 Sequence_biotypes and 3 Knockdown types sorted in accordance with your plot, it will be simply like rep(seq(5, 7, length.out = 6), 3).
Thank you so much! This work smoothly! I am pasting the code with the answer and closing the question