Hi all,
I am writing a tool in Python which reads the allele fraction data [1] for each snp (coming from a DNA-seq experiment on solid tumor samples) and tries to find the change points in the data track. However, I have noticed that the standard deviation (noise level) in regions with amplification is much higher than regions with deletion. I was wondering why this is the case since this has an effect on the performance of the tool in detecting the change points?
Following figures represent the idea visually. Fig 1. shows a region in chr4 with deletion and fig 2. shows a region in chr9 with amplification. Fig 3 represent a snapshot of how the tool works for the moment.
fig 1. chr4 with deletion
fig 2. chr9 with amplification
fig 3. detected change points with different window sizes
Thanks in advance for your sharing your ideas with me! :)
[1] Allele fraction for each snp is calculated as: (#alternative allele / #total reads)
Could it be due to the fact that there are fewer discrete steps on the way down than up? Lose one allele and you have 1+noise, lose the other and you have noise. Gain an allele and you have 3+noise and then the sky is the limit...
Could also imagine that amplifying something to high copy numbers by e.g. breakage-fusion-bridge cycles is inherently messy and could result in higher overall variance..
I was considering this might have something to do with the read (A, T, C, G) at each snp position.