Interpretation of sex chromosome ploidy using CNVKit
1
0
Entering edit mode
8.7 years ago
andcl93 • 0

I was using CNV kit for Whole Exome Sequencing data (male samples) ploidy visualization.

The first time CNVkit was run without the -y option; which gave me a scatter figure that showed all autosomes to have a copy number ratio of 0. However, the X chromosome copy number ratio was -1 and 0 for chromosome Y. At the second trial the -y option for male reference was added which gave me a different result which showed a copy number ratio of 0 for both X and Y.

These results raised me 2 questions:

  1. For autosomes, the normal copy number ratio would be log2(2/2) = 0 Is it normal for the X chromosome without the -y option to end up in log2(1/2) = -1 ? ; because chromosome X showed 0 for copy number ratio when the -y option was on. Is there a compensation algorithm for males because they have only one X chromosome?

  2. And if the Y chromosome copy number ratio is 0, does it mean the sample is XYY or is there another algorithm that compensates the Y chromosome copy number ratio as it appears only in single doses in normal males which should look like log2(1/2) = -1 in normal male samples? (It showed 0 for both cases, with and w/o the -y option).

cnvkit sex chromosome • 3.4k views
ADD COMMENT
0
Entering edit mode
8.6 years ago
Eric T. ★ 2.8k

Running the "batch" command without the -y flag assumes a female reference, with expected/neutral ploidy of 2 for autosomes, 2 for X, and 1 for Y (for the sake of having a baseline level for comparing male samples). Using this reference, male samples without sex-chromosome abnormalities will have 1 X, 1 Y, so the log2 ratios you'll see are log2(1/2) = -1 for X, log2(1/1) = 0 for Y. A female sample will have log2(2/2) = 0 for X, log2(0/1) = -infinity (in practice, just some noisy deep negative values) for Y.

Rerunning "batch" with the -y flag, the expected ploidies of the sex chromosomes are 1 for X, 1 for Y. With a male reference a normal male sample with have log2(1/1) = 0 for both X and Y; a normal female sample will have log2(2/1) = +1 for X, log2(0/1) = -infinity (very low numbers) for Y.

So, in both cases what you're seeing is as expected. Your male samples are normal XY.

ADD COMMENT
0
Entering edit mode

Hi Etal, That was very interesting! Do u know if there is any option to set different ploidy in normal samples? Thank you in advance. If we are not sure about normal sample ploidy, is it correct to use rescale argument to improve visualization?

ADD REPLY
0
Entering edit mode

I'm not sure whether you mean the gender of the normal samples used to construct the reference, or the overall ploidy of some non-human species. The reference command checks the chromosomal gender of the input samples and adjusts automatically so that the reference is effectively male or female (with or without -y) even if the input samples are a mix of both genders.

For non-diploid species the log2 ratios are independent of ploidy, but when you use the call command to output absolute integer copy numbers, you can provide --ploidy as an argument.

All of the features of the rescale command are now included in call, so I recommend using call instead. Look at the --ploidy and --center options, in particular.

ADD REPLY
0
Entering edit mode

The problem that I have to deal with is that the normal sample I have in order to perform comparison is sequenced liver tissue, which is known that can be polyploid. I know it is not a good idea to use it as normal sample, but it s my only source of normal tissue. I also know that I can run it without normal, but at least I want to give a try. So, how can i use the ploidy or center argument here? Cause all my data are consistently below the neutral value 0 (suppose due to polyploidy of normal tissue), and rescaling corrected the visualization. Is this approach correct?

ADD REPLY
0
Entering edit mode

Polyploidy of the normal tissue shouldn't affect the log2 ratios if it's the same ploidy on all autosomes. Aneuploidy of the normal tissue will give confusing results, though.

If all or the majority of your segment means are below 0, then the problem is that too many regions had very low coverage, which skews the log2 ratios toward negative numbers (it will be closer to normally distributed if coverage is good and consistent). Try the --drop-low-coverage option to the segment command to remove the values that are causing at least some of the trouble. You may also want to re-run the pipeline with a larger off-target bin size (e.g. batch --antitarget-avg-size 200000).

ADD REPLY
0
Entering edit mode

So, you suggest that the extremely negative log2 ratios (-12 to -20) that I am getting are due to a uniformly bad coverage or due to non-specific target baits, right?

I tried to use the --drop-low-coverage argument and indeed my data look smoother. But my objection is that if you remove low coverage regions there must be a bias against losses. Unless this option removes low coverage regions both in normal and tumor samples. I was wondering if there is any argument to specify the minimum number of reads aligned in a specific bin in order to include or exclude this region from the analysis.

ADD REPLY
0
Entering edit mode

Yes, that's it. The log2 scaling actually to introduces a bias toward losses in low-coverage regions, which --drop-low-coverage counteracts. The tumor sample itself is a mix of tumor and some normal DNA, so any bins that were adequately captured in the normal cells within the tumor sample will have a log2 value above the --drop-low-coverage threshold. But if you're running germline samples, don't use that flag or you'll screen out true homozygous deletions.

ADD REPLY
0
Entering edit mode

Thank u so much Etal! Do you know what is the threshold of --drop-low-coverage?

ADD REPLY
0
Entering edit mode

It's -15 (in log2 scale).

ADD REPLY

Login before adding your answer.

Traffic: 1201 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6