I'm using cnv kit to check for possible CNV in a kind of amyloidosis (human), and as this is the first time I'm doing a CNV analysis I've some doubts.
I've run cnvkit as it follows within a loop in bash, being f index .bam "tumor" and i "normal" :
python cnvkit.py batch ${array[f]} --normal ${array[i]} --targets .../S07604624_Padded_versionCNVKIT.bed --fasta .../HG19/hg19.fa --access .../cnvkit-master/data/access-5k-mappable.hg19.bed --diagram --scatter --output-reference $outdirref --output-dir $outdir
Then, used purity of 90% as I've been told by the people who did the exome sequencing:
python .../cnvkit.py call resultsRS_7_tumor_recalibrated.cns --purity 0.9 -o output_7.cns
I choose two of the 20 samples that I have and I check for the number of CNV. The first thing that I see is that sample 9 has 2403 rows and sample 7 has 705. I understand that this is due to the CBS algortihm, which if I'm not wrong tries to join contiguous bins with similar log2 ratios. This means that sample 7 has a more homogeneous log2 ratio across all contiguous bins. Right?
Then I plot an histogram of the CN column, and the problem, may be because this is the first time I'm doing this kind of analysis but I don't trust the results.
Most of the segments in the .cns are duplicated. Does a CN of 2 means normal levels in human sample or means that that segments has number of reads twice as many as expected?
Either of the cases, the distribution that I show seem too weird to me. So waht may I be doing wrong?
Thanks for your time!