I am studying the distribution of breakpoints among different human genomes looking for hotspots in the "samples" genomes that are enriched in breakpoints. To do so, I have divided the each chromosome in bins of 10Kb and the I have counted how many breaks are present in each bins. I have done the same for some control datasets and for randomly generated datasets. At this point, what is the best statistical test I could use to determine the p value for each bin?
The data I have looks like this:
Sample Control
Breaks_Bin1 10 3
Breaks_bin2 15 6
Breaks_bin3 5 3
Yes, indeed I am trying to detect counts differences between samples and controls. However, since the data comes from different labs, I am looking for a proper statistical approach to validate the findings, to determine whether the difference in counts is significant (p value) or not and I would like to do it using a scipy.stats function or something similar. However I cannot figure out what approach is the best. Chi2, Fisher, Pearson? I am getting different results from each of them and I am not sure which one fit best for my data.