I have copy number data from cancer patients, both cancer and matched normal, that have already been segmented using DNAcopy. The problem is the segmentation function in DNAcopy had min.width set to the default of 2 producing many segments with few markers that appear more like noise than signal. My questions are:
1) Would there be any problem or concern with removing segments with low number of markers post segmentation?
2) What is the standard minimum number of markers per segment for copy number data and is there any justification or reference for this minimum number of markers?
Is this for research or for clinical diagnostics? Did the source data come from array CGH, SNP array, or NGS?
For research on a cohort, see if you can compare your DNAcopy results to another assay and determine the minimum segment size where the assays agree. Then, go ahead and remove small segments below that size, and be sure to mention this step in the paper since your research findings will only apply to CNVs above that size.
For diagnostics, if you've determined that segments below X probes are not reliable / you've chosen to only validate your assay for >X probes, then you can remove the small segments but you'll need to denote those as no-call regions and, if they are in a cancer-associated gene, let the clinician know and consider running another assay for that site.