Hi, This is a statistics question! I hope you may help me ! On a DNA sequence region, there are several mutation hits. I want to know if the hits distribution follow or not a random distribution. In other words, imagine a target dart. How can get a p-value, to know if darts has been throw randomly or not .
Chi-squared test is based on an approximation for large sample size. There could be need for creating large enough sub-bins of the DNA region so that several mutations are expected to occur. Also, something to be careful for is the mutational process that generated those mutations. There could be preferences in the base context of the mutations (e.g. CpG mutations) that could alter the expectation of a uniform distribution because of the particular sequence of the DNA region.