I was using CNV kit for Whole Exome Sequencing data (male samples) ploidy visualization.
The first time CNVkit was run without the -y option; which gave me a scatter figure that showed all autosomes to have a copy number ratio of 0. However, the X chromosome copy number ratio was -1 and 0 for chromosome Y. At the second trial the -y option for male reference was added which gave me a different result which showed a copy number ratio of 0 for both X and Y.
These results raised me 2 questions:
For autosomes, the normal copy number ratio would be log2(2/2) = 0 Is it normal for the X chromosome without the -y option to end up in log2(1/2) = -1 ? ; because chromosome X showed 0 for copy number ratio when the -y option was on. Is there a compensation algorithm for males because they have only one X chromosome?
And if the Y chromosome copy number ratio is 0, does it mean the sample is XYY or is there another algorithm that compensates the Y chromosome copy number ratio as it appears only in single doses in normal males which should look like log2(1/2) = -1 in normal male samples? (It showed 0 for both cases, with and w/o the -y option).
Hi Etal, That was very interesting! Do u know if there is any option to set different ploidy in normal samples? Thank you in advance. If we are not sure about normal sample ploidy, is it correct to use rescale argument to improve visualization?
I'm not sure whether you mean the gender of the normal samples used to construct the reference, or the overall ploidy of some non-human species. The
reference
command checks the chromosomal gender of the input samples and adjusts automatically so that the reference is effectively male or female (with or without-y
) even if the input samples are a mix of both genders.For non-diploid species the log2 ratios are independent of ploidy, but when you use the
call
command to output absolute integer copy numbers, you can provide--ploidy
as an argument.All of the features of the
rescale
command are now included incall
, so I recommend usingcall
instead. Look at the--ploidy
and--center
options, in particular.The problem that I have to deal with is that the normal sample I have in order to perform comparison is sequenced liver tissue, which is known that can be polyploid. I know it is not a good idea to use it as normal sample, but it s my only source of normal tissue. I also know that I can run it without normal, but at least I want to give a try. So, how can i use the ploidy or center argument here? Cause all my data are consistently below the neutral value 0 (suppose due to polyploidy of normal tissue), and rescaling corrected the visualization. Is this approach correct?
Polyploidy of the normal tissue shouldn't affect the log2 ratios if it's the same ploidy on all autosomes. Aneuploidy of the normal tissue will give confusing results, though.
If all or the majority of your segment means are below 0, then the problem is that too many regions had very low coverage, which skews the log2 ratios toward negative numbers (it will be closer to normally distributed if coverage is good and consistent). Try the
--drop-low-coverage
option to thesegment
command to remove the values that are causing at least some of the trouble. You may also want to re-run the pipeline with a larger off-target bin size (e.g.batch --antitarget-avg-size 200000
).So, you suggest that the extremely negative log2 ratios (-12 to -20) that I am getting are due to a uniformly bad coverage or due to non-specific target baits, right?
I tried to use the
--drop-low-coverage
argument and indeed my data look smoother. But my objection is that if you remove low coverage regions there must be a bias against losses. Unless this option removes low coverage regions both in normal and tumor samples. I was wondering if there is any argument to specify the minimum number of reads aligned in a specific bin in order to include or exclude this region from the analysis.Yes, that's it. The log2 scaling actually to introduces a bias toward losses in low-coverage regions, which --drop-low-coverage counteracts. The tumor sample itself is a mix of tumor and some normal DNA, so any bins that were adequately captured in the normal cells within the tumor sample will have a log2 value above the --drop-low-coverage threshold. But if you're running germline samples, don't use that flag or you'll screen out true homozygous deletions.
Thank u so much Etal! Do you know what is the threshold of
--drop-low-coverage
?It's -15 (in log2 scale).