This image shows the log2 ratio of tumor coverage over control coverage for each exon capture region. As you can see there is a clear bias with GC content, making segmentation useless. Both samples are of good quality, sequenced using the same exon capture kit (but different sequencing batches) and there is unlikely to be contamination (as determined by mapping rate, which is near 100% to GRCm38). The control tissue is from liver.
Any suggestions on how I could try and correct the bias? I would expect log2 fold changes to have equal variance across GC content range.
I should also add that the CNV algorithms I've tried do correct for GC bias, but they were not sufficient.
There are generally more segments with intermediate GC content, so wouldn't you expect an increase in variance in the middle (granted, there are a lot of points ~0.6 and there's low variance there, so perhaps I'm over blowing this)? What seems odd to me is that the variance is asymmetric. Presuming you have a BED file with the capture probe coordinates, you could try making a black list out of its complement and see what
computeGCBias
from deepTools outputs. The performance won't be great, since I didn't write the blacklisting stuff with this in mind, but it'll at least give you a better idea about whether you really have an issue.Thanks for the reply. I' am primarily using CNVkit for my analysis, which uses regions outside the capture regions to aid in CNV analysis. I attached a plot of log2 ratios to GC bias for those regions outside the capture regions (i.e. the black list regions as you suggested). The image in my original post is only the capture regions.
Also, here is an image from another tumor that works well with liver, demonstrating how I don't think that increased variance at difference GC levels should be there.