I ran varscan copynumber on my high coverage, whole genome paired end data (matched normal @ 30x & tumor @ 65x after duplicate removal) using default parameters and followed the workflow on the varscan webpage (copynumber, copycaller, DNAcopy segmentation following the R code sample from Chris). Plotting the output using DNAcopy plot function revealed that the seg means are extremely noisy and over-segmented (looks like a cloud ranging from -2 to 2 on a log2 scale). Oddly enough, the chrUn chromosomes produce much more 'normal' looking plots with a red seg mean line (instead of a cloud) that is mostly around 0. I suspect that maybe the default parameters are not appropriate for high coverage genome-wide data? I'm not sure what to change though. Should I modify --max-segment-size
and/or --min-segment-size
in varscan copycaller? Any ideas?
Yes, sorry I should have included that in my post. I used
--data-ratio
to adjust for the difference in sequence coverage. All other options were default.