I am using ExomeCNV to call copy numbers of some whole exome sequencing data (tumor and normal). I'd like to use GISTIC2.0 (standalone) to identify recurrent CNVs. The outputs of ExomeCNV are as follow:
- chr
- probe_start
- probe_end
- coverage
- targeted.base
- sequenced.base
- copy.number
- logR
- ratio
- spec
- sens
- average.coverage
What is the recommended way to use the ExomeCNV output to run GISTIC2.0?
More specifically, I'd like to know the answers to the following questions:
1) What should I use as Num markers (number of markers in segment)?
2) What should I use as Seg.CN (log2() -1 of copy number)?
For my question #2 it seems that I can calculate the Seg.CN value in two ways based on the GISTIC documentation (ftp://ftp.broadinstitute.org/pub/GISTIC2.0/GISTICDocumentation_standalone.htm):
Approach #1. log2(2 * ratio) - 1
Approach #2. log2(copy.number) - 1
I think approach #1 is more accurate as the raw ratio is reflected in the input data. Approach #1 is based on the post on using Sequenza outputs for GISTIC2.0 (http://crazyhottommy.blogspot.kr/2017/11/run-gistic2-with-sequenza-segmentation.html). After all, in ExomeCNV outputs there are some deletions (copy.number=1) when the ratio is 0.945 and I want to be more conservative and filter these out in GISTIC2.0.
Thanks.