Dear all, I am using CNVkit to analyse exome sequencing data (germline samples) in order to find new copy number variants. For the pipeline analysis I followed all the found suggestions for germline samples and here are the steps:
1) create a reference using all samples
cnvkit.py batch --normal ALL_BAM/*.bam --targets Exome_SureSelect_QXTV7_forCNVkit.bed --fasta hs37d5.fa --access access-5k-mappable.hg19.bed --output-reference reference.cnn
2) batch command using the previously created reference
cnvkit.py batch ALL_BAM/*.bam -r reference.cnn -d results/
3) add ci column
cd results/
for i in *.cnr ; do cnvkit.py segmetrics -s `basename ${i%%.cnr}`.cn{s,r} --ci ; done
4) call command
for i in *segmetrics* ; do cnvkit.py call $i --filter ci -m clonal --center mode -o call_cnvkit/`basename ${i%%segmetrics*}`.call.cns; done
5) cnv_ztest
for i in *.cnr; do cnv_ztest.py $i -t -s call_cnvkit/`basename ${i%%.cnr}..call.cns` -o cnv_ztest/`basename ${i%%.cnr}`.ztest.cnr; done
I am now looking at cnv_ztest results but I am not understanding log2 values reported in the file because they have quite negative values in all samples (like between -21 and -8). Is there something I am missing? Or probably is there an error in my pipeline? I also had a look at this thread but because I am using germline samples I didn't use the --drop-low-coverage as suggested. Do you have any idea about what is going wrong? Thanks in advance!
Stefania
Hi Stephania,
is it for sure that your targeted enrichment kit was the same for both cases and controls?
Hi German, yes the target enrichment kit is the same for all samples.
Very low negative numbers show homozygous deletions. They are found as very low coverage. Somehow your tumor samples do not have coverage in these regions.
Having many recurrent homozygous deletions is very unlikely. Some genes such as CDKN2A are "famous" for that, but still, I would not expect more than 20% of samples to have it.