Question

Effect of different sample sizes on p-value

0

Entering edit mode

3.6 years ago

c_u ▴ 530

Hello,

I have created the following figure. enter image description here

I have a pretty significant p-value comparing the two groups. The two samples are basically just a list of numbers that are genome sizes. I am sure the high samples sizes has a role to play in how significant the p-value is. I have the following two questions, the second question being the main one -

Should I perform any kind of multiple correction here? I would think not, since I am doing just one test
Is it problematic that the two sample sizes are different? I know if one was 5 and the other 5000, that would not be a very powerful test, but in this case (or a case with similar numbers) would it lead to spurious p-values? If it would, I can take a random subsample from 'source 2' of 341 datapoints, to make them equal.

Thank you

statistics conceptual • 964 views

ADD COMMENT • link 3.6 years ago by c_u ▴ 530

score 2 · Answer 1 · 2021-08-18

The larger your sample size the more statistical power you will tend to have, meaning you can detect differences with smaller magnitudes. The question you should ask alongside the calculation of a p-value is what magnitude difference is biologically meaningful. For example, if you have a significant p-value but an average difference in genome size of 1 kb I don't suspect that it's biologically interesting.

Going back to your test choice, your test should reflect the parameter that interests you the most. For example, a KS test is sensitive to distribution shape, so you could have the case of identical means but a significant p-value, the conclusion of which might not be terribly interesting or relevant to your question. Just make sure the test used is in line with your parameter of interest.