Hi all.
I heard that most of CNV detection tools are based on read depth and they make Gaussian assumption about the distribution of read count ratio.
In this point of view, What is x-axis and y-axis meaning of Gaussian distribution? I could think y-axis as a read-depth count and X-axis as a position in exons? Is it right? then, each exon in exome sequencing follow Gaussian distribution?
With having above concept, I cannot connect concept above into sentence below. Could you look at it for advice?
Most of the existing tools for CNV calling that are based on read depth, such as ExomeCNV and CNV-seq, make Gaussian assumptions about the distribution of read count ratio. In the absence of technical variability, the proportion of reads matching to a specific sample should follow a binomial distribution whose success rate is determined by genome-wide read count ratio between the test sample and reference set.
I understood concept above as just one sequence sample follows Gaussian distribution while sample-to-sample follows binomial distribution.
Is it right that what I am understanding?
Say we have a binomial distribution B(n,p). If n is large and both np and n(1-p) is not small, it can be approximated with a Gaussian distribution N(np,np(1-p)). See the wikipage of binomial distribution.