Hello everyone,
I have a couple of tumor-germline paired samples and I tried running cnvkit this way:
cnvkit.py batch *Tumor.bam --normal *Normal.bam \
--targets my_baits.bed \
--fasta hg19.fasta --access data/access-5kb-mappable.hg19.bed \
--output-reference my_reference.cnn --output-dir results/ \
--diagram --scatter
When I do this, I get the following output and the analysis stops with the creation of a cnr file. I do not see a cns file/the scatter and diagram pdfs being produced.
> Correcting for GC bias... Correcting for RepeatMasker bias...
> Loading sampleA_Normal.antitargetcoverage.cnn
> Correcting for GC bias... Correcting for RepeatMasker bias...
> Loading sampleA_Normal.antitargetcoverage.cnn
> Correcting for GC bias... Correcting for RepeatMasker bias...
> ....local/lib/python3.6/site-packages/pandas/core/frame.py:7116:
> FutureWarning: Sorting because non-concatenation axis is not aligned.
> A future version of pandas will change to not sort by default.
> To accept the future behavior, pass 'sort=False'.
> To retain the current behavior and silence the warning, pass
> 'sort=True'.
>
> sort=sort, Calculating average bin coverages Calculating bin spreads
> Targets: 1162 (13.7499%) bins failed filters (log2 < -5.0, log2 > 5.0,> spread > 1.0)
> Antitargets: 19439 (100.0000%) bins failed filters
> Wrote my_reference.cnn with 27890 regions
> Running 1 samples in 8 processes (that's 8 processes per bam)
> Running the CNVkit pipeline on SampleB_Tumor.bam
> ... Processing reads in SampleB_Tumor.bam
> Time: 76.712 seconds > (63585 reads/sec, 110 bins/sec)
> Summary: #bins=8451, #reads=4877720, mean=577.1767, min=0.0, max=3535.910891089109
> Percent reads in regions: 92.246 (of 5287755 mapped)
> Wrote SampleB_Tumor.targetcoverage.cnn with 8451 regions
> Processing reads in SampleB_Tumor.bam
> Time: 2.223 seconds (0 reads/sec, 8746 bins/sec)
> Summary: #bins=19439, #reads=0, mean=0.0000, min=0.0, max=0.0
> Percent reads in regions: 0.000 (of 5287755 mapped)
> Wrote SampleB_Tumor.antitargetcoverage.cnn with 19439 regions
> Processing target: SampleB_Tumor
> Keeping 7289 of 8451 bins Correcting for GC bias...
> Correcting for density bias...
> Processing antitarget: SampleB_Tumor
> Keeping 0 of 19439 bins
> WARNING: most bins have no or very low coverage; check that the right BED file was used
> Wrote SampleB_Tumor.cnr with 7289 regions
> Segmenting SampleB_Tumor.cnr
> ... Segmenting with method 'cbs', significance threshold 0.0001, in 8 processes
> .../cnvkit/cnvlib/smoothing.py:197:
> RuntimeWarning: invalid value encountered in greater bad_idx = (y >
> x.max()) | (y < x.min())
> .../cnvkit/cnvlib/smoothing.py:197:
> RuntimeWarning: invalid value encountered in less bad_idx = (y >
> x.max()) | (y < x.min())
I see the RuntimeWarnings repeat several times and the command ends with no other output being produced. Has anyone else experienced this?
Any input is appreciated! My cnvkitversion is 0.9.6.dev0 and python version is 3.6.1 . Thank you.
Have you checked this? Your command says
--targets my_baits.bed
; are these the same as the target regions that you are supposed to pass?Hi Steve, I might be mistaken but I send in my target regions (baits) as a bed file to --targets. I see the message above for all of my tumor samples and had initially understood this as since anti target sites are not targeted sites, there is little/no coverage there. Should I be doing something different?
For reference, the CNVkit portion of our pipeline is here: https://github.com/NYU-Molecular-Pathology/NGS580-nf/blob/ad4b2f42efb56d0753e58c2e07c4ef6dc7fc2337/main.nf#L3861 we do some extra custom processing as well
Our targets .bed files are here: https://github.com/NYU-Molecular-Pathology/NGS580-nf/tree/ad4b2f42efb56d0753e58c2e07c4ef6dc7fc2337/targets
I did not write this portion of the pipeline so my expertise is limited, but this is what we've got.
Hi Steve,
Thank you for your input and advise! I could run this script using Python 2 with the same bed file without issues but have been facing problems ever since I switched over to Python3. Nonetheless, I will refer to your pipeline to better understand how I could be running cnvkit.
Oh well if it worked in Python 2, then you should keep it that way. Use a
conda
installation or container of some sort to isolate it. This is our Dockerfile for building a CNVKit container: https://github.com/NYU-Molecular-Pathology/NGS580-nf/blob/master/containers/cnvkit-0.9.0/Dockerfile ; its using the Ubuntu 16.04 default Python which should be 2.7for our sequencing panel, there were two .bed files provided: probes.bed (baits?) and targets.bed. The regions contained in both are different. The latter is the one that we use for CNVKit (targets.bed). You might want to check that there is not some other .bed file related to the sequencing that you are supposed to use instead of your baits.bed file.
If probes.bed indicates the captured regions, I'd go with that over targets.bed for CNV calling -- not all of the requested targets may have been successfully captured by the panel kit.
Hello everyone, I need to call the CNV from my hunam WGS data. I searched for many tutorial, but I did'nt get any proper procedure. I am trying to call the CNV using CNVkit. Can any one tell me the procedure and in that cnvkit pipeline there is a file called bait.bed file used as (-t TARGETS) option. where will get that file. Do I need to generate on my own, if that so how can I able to generate that.
At least according to the documentation, it seems for WGS you can use the
batch --method wgs
option. There is more information here: https://cnvkit.readthedocs.io/en/stable/nonhybrid.html which also includes some extra options and notes. This will calculate the needed targets on the fly if there is not one provided - though if you only want i.e. genic regions then you can get/make a bed file that contains only the genic regions.Thank you for the information.