I was wondering what is the best protocol for getting accurate sex chromosomes copy number predictions?
Currently I have a pool for normals of mixed gender which I am passing into the reference
function. I don't set the -y
option to create a male reference. There appears to be no option to actually give the gender of the various input coverage files. Yet if you use the -y
option, the manual says:
Create a male reference: shift female samples' chrX log-coverage by -1, so the reference chrX average is -1. Otherwise, shift male samples' chrX by +1, so the reference chrX average is 0.
I assume that it automatically detects the gender and accounts for their sex chromosomes? Or is there a way to pass in the exact gender of each input normal sample?
I then use this pooled reference to then call fix
, and then when it comes to the call
function there is the -g
option to specify the gender of the input sample, unlike the reference
function.
Is there any critical steps that I am missing to getting accurate sex chromosome copy number predictions?
@Etal: there is indeed a problem with the automatic gender detection when using
reference
command.My sample is from a male. Command I use is:
For the
targetcoverage.cnn
, gender is wrong:For the
antitargetcoverage.cnn
, gender is correct:Since I have only 1 normal sample, I cannot discard it from the reference pool. Is there a way to edit the script to bypass automatic gender detection?
Thanks, I'll see about adding a
--gender
option to thereference
command in the next release.Workaround: It looks like the coverage of the Y-chromosome targets in your sample was poor. Look at the 'log2' column in normal_ref.cnn to identify the targets on Y that were poorly captured in your normal sample, then delete those targets from your target BED file or the source targetcoverage.cnn files (make sure they all match) and rebuild the reference. If only the well-captured targets on Y remain, gender detection should work better. (If no Y targets remain, the pipeline will still work.)
I changed the statistical test in the development version of CNVkit on GitHub, so if you're able to try that it might deliver a better result. But given that the majority of targets on Y had poor coverage, it might still be misled into thinking there is no Y chromosome in your sample.
To hard-code your sample's gender in the script, you can edit
cnvlib/reference.py
line 99 or so, where it says:Replace the method call with
False
to treat the sample as male.Ok. I try that. Thanks !