cnvkit output understanding
2
1
Entering edit mode
7.9 years ago
Folder40g ▴ 190

Hi

I'm using cnv kit to check for possible CNV in a kind of amyloidosis (human), and as this is the first time I'm doing a CNV analysis I've some doubts.

I've run cnvkit as it follows within a loop in bash, being f index .bam "tumor" and i "normal" :

python cnvkit.py batch ${array[f]} --normal ${array[i]} --targets .../S07604624_Padded_versionCNVKIT.bed --fasta .../HG19/hg19.fa --access .../cnvkit-master/data/access-5k-mappable.hg19.bed --diagram --scatter --output-reference $outdirref --output-dir $outdir

Then, used purity of 90% as I've been told by the people who did the exome sequencing:

python .../cnvkit.py call resultsRS_7_tumor_recalibrated.cns --purity 0.9 -o output_7.cns

I choose two of the 20 samples that I have and I check for the number of CNV. The first thing that I see is that sample 9 has 2403 rows and sample 7 has 705. I understand that this is due to the CBS algortihm, which if I'm not wrong tries to join contiguous bins with similar log2 ratios. This means that sample 7 has a more homogeneous log2 ratio across all contiguous bins. Right?

Then I plot an histogram of the CN column, and the problem, may be because this is the first time I'm doing this kind of analysis but I don't trust the results.

Most of the segments in the .cns are duplicated. Does a CN of 2 means normal levels in human sample or means that that segments has number of reads twice as many as expected?

Either of the cases, the distribution that I show seem too weird to me. So waht may I be doing wrong?

Sample 9 Sample 7 Thanks for your time!

cnvkit cnv • 4.9k views
ADD COMMENT
2
Entering edit mode
7.9 years ago
Eric T. ★ 2.8k

Yes, your understanding of segmentation is right. You can plot the outputs with the segment command for a more intuitive view of each sample's copy number profile, or heatmap to see all samples together. The integers in the CN column are the estimated ploidy, so 2 is neutral for diploid chromosomes. (But might not be for sex chromosomes.)

Sample 7 looks about how I would expect, mostly neutral calls (CN=2), some single-copy losses (1) and gains (3), a few multi-copy gains (4) and homozygous deletions (0). Sample 9 is noisier, and shows a surprising number of higher-level amplifications that could be false positives. Other than that it's not wildly unexpected. Look at any quality control metrics you have to see if Sample 9 and any other samples in your cohort have low coverage or are otherwise expected to be problematic.

You can make the segmentation more specific and less sensitive by using -t .00001 or, if that's not working out, -m haar. You can also reduce noise earlier in the process by using a pooled reference if the samples in your research cohort were prepared and sequenced with the same process (exome baits, capture kits, etc.) -- use the batch command to do this in one shot. In my benchmarks, a pooled reference usually performed noticeably better than running tumor-normal pairs independently.

ADD COMMENT
0
Entering edit mode
7.9 years ago
Folder40g ▴ 190

Thanks. I'll give a shot to the pooled reference and see how it goes.

Thanks

ADD COMMENT

Login before adding your answer.

Traffic: 1590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6