Question

Compare boxplot with Wilcoxon test

0

Entering edit mode

4.1 years ago

FKM • 0

Hello,

I am comparing two groups of lengths (different individuals) with boxplots using ggplot2 package. I want to compare the two distributions but so far the only way I found to use a wilcoxon test is stat_compare_means from the "ggpubr" package. Is it the right way to compare the distributions? Can I compare the distribution and not the mean specifically? As you can see, I am a newby in the stat world. Thank you!

boxplot wilcoxon ggplot ggpubr • 5.2k views

ADD COMMENT • link updated 4.1 years ago by Alex Reynolds 36k • written 4.1 years ago by FKM • 0

0

Entering edit mode

Is your data Normally distributed? (i.e. does the population of each group of lengths look Gaussian?) If so, one could use a t-test to compare the two normal distributions (the null hypothesis would be that they have the same mean).

ADD REPLY • link 4.1 years ago by seidel 11k

score 1 · Answer 1 · 2021-03-14

1

Entering edit mode

4.1 years ago

Alex Reynolds 36k

Non-parametric tests may be preferred if you cannot assume your data are normally distributed. Comparisons around medians, for instance, may be preferred to comparisons of means. If you retrieve IQR statistics from your boxplots (the actual numbers) you can compare the 95% CIs (confidence intervals) around medians between different sets of data. If those CIs overlap, then you might argue that there is no evidence of significant difference between groups. See: https://www.nature.com/articles/nmeth.2813.pdf for more detail about boxplots, generally.

ADD COMMENT • link 4.1 years ago by Alex Reynolds 36k

0

Entering edit mode

Thank you for resources for understanding box-plots better! I confirm I cannot assume a normal distribution, hence the Wilcoxon test. If these are the ways of comparing the distribution then I might actually be fine with stat_compare_means and perform the test comparing the means. The box-plots are almost identical, therefore I do not expect a significant p-value.

ADD REPLY • link 4.1 years ago by FKM • 0

0

Entering edit mode

There are two "Wilcoxon" tests built into R, but they are each unique and you would pick one or the other, depending on whether you are comparing paired or unpaired samples, and what difference you are trying to establish, if any.

A paired sample would run the Wilcoxon signed-rank test. This does not test the difference in means, but ranked differences between pairs of measurements.

In R, you would use wilcox.test(x, y, paired=TRUE) to run this test.

Running the test on unpaired samples would run the Mann-Whitney U Test. This is also called the Mann-Whitney-Wilcoxon test, which tests differences in the magnitude between groups.

In R, you would use wilcox.test(x, y, paired=FALSE) to run this test.

In the paired and unpaired versions of this test, x and y variables would be vectors containing the scores you want to compare.

Which test you run depends on your experimental setup and what you want to say about the difference between groups, if any.

An example of paired datasets would be before/after conditions; say, plates of cells that you measure some condition with saline solution or buffer ("control"). You then treat those plates afterwards with a drug ("treatment") and repeat the measurement. The drug response might be dependent upon the cells' response to buffer.

Unpaired data would be taking plates at random, and either treating them with buffer or with a drug. There is no before/after condition on plates that connects treatment and control.

For more information on the wilcox.test function, you can run ?wilcox.test in R to get a rundown of what this test does, along with other parameters and usage examples.

Putting a question mark ? in front of function names is a good way to learn or look up bits of R, generally.

I would gently advise against using plotting scripts to run statistical tests. Plotting scripts and libraries like ggplot and related are best used for making figures. Statistical tests can be run with dedicated functions, many of which are already available from R. Their implementations are stable and reproducible.

ADD REPLY • link 4.1 years ago by Alex Reynolds 36k

0

Entering edit mode

Thank you for the extensive explanation. I wanted a command to input the test in the graph, since I have used a large datasets and facets but you are right, it might be a better idea to compute the test separately and then plot the p-values in the graphs. My data includes paternally and maternally inherited genetic variations (groups have different counts) for which I am comparing a shared feature (their size), therefore I am treating them as an unpaired dataset.

ADD REPLY • link 4.1 years ago by FKM • 0

0

Entering edit mode

I have used the wilcoxon.test formula to compute my unpaired test and get the p-values. I found that the p-values are the same as the ones obtained using the stat_compare_means formula from the ggpubr package. The command documentations states that the latter uses means to perform the test. Is this therefore a coincidence or there is a mathematical reason behind it? Maybe the basic R wilcoxon.test also use the means? Secondly, what did you mean by "which tests differences in the magnitude between groups"? What is the magnitude in this case?

ADD REPLY • link 4.1 years ago by FKM • 0