Question

measure the distribution bias in genomic features

0

Entering edit mode

5.7 years ago

Hughie ▴ 30

Hi everyone,

I'm recently analyzing DNA methylation data and facing an obstacle problem here:

As we know that the DNA methylation distribution can vary differently in genomic features (core promoter, enhancer, CpGIsland, etc). I want to measure the distribution bias among these genomic features now.
In other words, I want to know the deviation between expected and observed DNA methylation sites number?

I read some papers and found various methods used in this analysis, for example, independent t-test, Chi-square test, Mann-Whitney U test, permutation test, etc, which made me really confused on choosing.

I have tried the independent t-test and calculated the ratio = log2(mean of observed/mean of expected) for plotting heatmap (In this result, if the ratio > 0, I will say DNA methylation occurs more often in this region and vice verse). However, someone told me that the Chi-square test may better on measuring the difference between observed and expected. I also tried this too. However, I can only get a chi-value for each genomic feature, which varies a lot (from 300 - 40000000), difficult for visualization.

So, I have several questions:

Which methods do you think is better for this kind of problem?
If Chi-square distribution is used, how to properly handle the chi-value for visualization (normalize the chi-square of each region to a random region?)
I noticed the p-value is typically small (10e-100 often and even 0 reported), I referred some answer on how to handle very large dataset for statistical test and find there are no clear conclusions. So, if you make statistical test on a very large dataset (typically, sample size in the level of 10e6 is usual in bioinformatics), how do you handle the very small p-value?

Thanks for your time, really appreciate any answers!

Statisticas • 939 views

ADD COMMENT • link updated 5.7 years ago by Biostar 20 • written 5.7 years ago by Hughie ▴ 30