Question

The center in scatter plot generated by CNVkit looks off

0

Entering edit mode

5.3 years ago

Jordan ★ 1.3k

Hi,

I ran CNVkit piplene on WGS samples. I have 4 tumor/normals and pooled the normals.

Here is the command I used:
cnvkit.py batch -p $OMP_NUM_THREADS $BAMs/*_T*.bam -n $BAMs/*_B*.bam -m wgs -f $refs --annotate $refFlat --output-reference $out/project.cnn --output-dir $out

I dropped low coverage reads using the following command:

cnvkit.py segment $file -o $out/drop_low_cov/${sample}.cns

But my scatter plot looks quite weird. The y chromosome has too many deletions and in general it looks the deletions are on a much larger scale.

Is there a way to address this?

Here is the plot

Thanks for the help!

cnvkit wgs scatter-plot • 2.0k views

ADD COMMENT • link updated 5.3 years ago by d-cameron ★ 3.0k • written 5.3 years ago by Jordan ★ 1.3k

0

Entering edit mode

What sex is your sample?

ADD REPLY • link 5.3 years ago by d-cameron ★ 3.0k

0

Entering edit mode

These are female samples.

ADD REPLY • link 5.3 years ago by Jordan ★ 1.3k

0

Entering edit mode

So why are you worried about the CN profile on the Y chromosome? There is no Y in your samples so everything you're seeing can be explained being one of the pseudoautosomal regions, or has homology with an autosome.

ADD REPLY • link 5.3 years ago by d-cameron ★ 3.0k

0

Entering edit mode

All the samples are female. I was a bit worried to see Y chromosome having so many deletions even if both normals and tumors are female samples. Other papers I have seen do not have such high deletions in the Y chromosomes as well.

ADD REPLY • link 5.3 years ago by Jordan ★ 1.3k

0

Entering edit mode

Do these other papers completely ignore Y for female sample? I know many pipeline just throw out anything on Y once the sample has been determined to be female. More sophisticated pipelines have extra logic to handle less common scenarios such as Downs and Klinefelter syndromes as, if you have a large cohort, you'll almost certainly encounter it.

ADD REPLY • link 5.3 years ago by d-cameron ★ 3.0k

score 0 · Answer 1 · 2020-04-07

This appears to be a sex determination issue. As per the CNVkit documentation:

By default, copy number calls and log2 ratios will be relative to a diploid X chromosome and haploid Y.

This can be adjusted if you know the sex of your sample (or you want CNVkit to predict for you). See https://cnvkit.readthedocs.io/en/stable/sex.html for more details.

The y chromosome has too many deletions and in general it looks the deletions are on a much larger scale. Is there a way to address this?

In general, a deletion should have both a CN loss in the deleted region, and a breakpoint that spans it. If you want do a comprehensive genomic rearrangement assessment of your tumour samples, I would suggest the GRIDSS/PURPLE/LINX pipeline [shameless plug disclaimer - I'm first author of that preprint]. On our cohort, we have 2.1% of (non-centromric/reference gap) somatic CN transitions without an explanatory SV. We have a few samples in the 20-40% range that show signs of DNA degradation hence the higher CN false positive rate.