How to evaluate the statistical significance of distribution of breakpoints between two datasets.
1
3
Entering edit mode
9.5 years ago
alec_djinn ▴ 390

I am studying the distribution of breakpoints among different human genomes looking for hotspots in the "samples" genomes that are enriched in breakpoints. To do so, I have divided the each chromosome in bins of 10Kb and the I have counted how many breaks are present in each bins. I have done the same for some control datasets and for randomly generated datasets. At this point, what is the best statistical test I could use to determine the p value for each bin?

The data I have looks like this:

                        Sample       Control
Breaks_Bin1             10           3
Breaks_bin2             15           6
Breaks_bin3             5            3
statistic • 2.3k views
ADD COMMENT
0
Entering edit mode
9.5 years ago

The way you present the problem it looks like you want to detect differences in counts between conditions. In this case I would look for methods developed for differential gene expression from RNA-Seq (DEseq, edgeR, limma/voom). Your 10kb windows would be "genes" and your break counts would be expression levels. If you don't have replicates of each condition, take care how you interpret the results though. Probably you need to pre-filter your data to remove windows with very low counts in both conditions to cut down the number of tests.

ADD COMMENT
0
Entering edit mode

Yes, indeed I am trying to detect counts differences between samples and controls. However, since the data comes from different labs, I am looking for a proper statistical approach to validate the findings, to determine whether the difference in counts is significant (p value) or not and I would like to do it using a scipy.stats function or something similar. However I cannot figure out what approach is the best. Chi2, Fisher, Pearson? I am getting different results from each of them and I am not sure which one fit best for my data.

ADD REPLY

Login before adding your answer.

Traffic: 995 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6