Hi All,
I am working on a few genomic regions of interest and have their quantile normalized M values of all the probes lying in that particular region. The data is obtained from Agilent 244k CpG island array. The average distance between the each probes is 100 bases.
I am trying to extract statistically significant regions that takes into account the M values of neighboring probes. I have pasted sample dataset below
ProbeName sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10 chr start end
probe1 -0.532 -0.923 0.402 0.503 -0.322 0.315 0.250 -0.498 -0.178 -0.667 chr1 884379 884423
probe2 0.808 -0.550 -0.315 -1.159 -0.659 -0.255 -0.100 -1.198 -0.991 -0.686 chr1 886633 886677
probe3 0.593 0.783 0.741 0.113 0.428 0.540 0.689 1.119 0.184 0.268 chr1 886707 886751
probe4 1.378 0.695 0.312 1.710 1.284 -0.619 1.331 1.121 1.502 1.517 chr1 887101 887145
probe5 -0.089 0.559 0.636 0.165 1.225 0.416 0.426 -0.453 1.260 0.205 chr1 887255 887299
probe6 0.786 0.620 -0.267 0.214 -0.320 -0.419 0.290 -0.375 -0.419 -0.390 chr1 887342 887386
probe7 -0.533 -0.085 -0.118 -0.042 1.008 -0.171 -0.015 -0.567 -0.497 0.093 chr1 887488 887532
probe8 0.551 1.018 1.793 -0.094 0.407 1.319 1.840 0.429 2.430 0.585 chr1 887598 887642
probe9 0.064 0.772 -0.348 -0.602 0.544 -0.841 -0.082 -1.362 -1.147 -0.627 chr1 887830 887874
probe10 -0.334 0.258 0.128 0.674 0.848 0.142 0.402 0.517 0.522 0.629 chr1 888033 888077
Is there any pre-existing tool or script that uses a sliding window approach and calculates the significance of the probes as well as the regions in question for a custom region?
Thank you
A test for significance? What are the null- and alternative hypothesis?
The coverage of probes on Agilent 244K CpG island array can be binned according to CpG Island. The experiment performed here is based on MeDIP experiment where in, the average fragment size varies from 100 - 800bp covering more or less 4 - 5 probes. Hence, I was wondering if by providing a definite window size would it possible to find statistically significant enriched regions, whose M-values are positive in that region for given group or set of arrays? The null hypothesis here was the extent of methylation is equal to the M values of the particular probe in question. But, I was not sure what statistics to apply and how.
Looks like les package as suggested by ff.cc.cc is able to do what is necessary.
Sliding windows are not independent this causes a serious problem.