CNVKIT - unable to produce scatter and diagram pdfs
2
0
Entering edit mode
5.4 years ago
ww22runner ▴ 60

Hello everyone,

I have a couple of tumor-germline paired samples and I tried running cnvkit this way:

cnvkit.py batch *Tumor.bam --normal *Normal.bam \
    --targets my_baits.bed  \
    --fasta hg19.fasta --access data/access-5kb-mappable.hg19.bed \
    --output-reference my_reference.cnn --output-dir results/ \
    --diagram --scatter

When I do this, I get the following output and the analysis stops with the creation of a cnr file. I do not see a cns file/the scatter and diagram pdfs being produced.

> Correcting for GC bias... Correcting for RepeatMasker bias...
> Loading sampleA_Normal.antitargetcoverage.cnn
> Correcting for GC bias... Correcting for RepeatMasker bias... 
> Loading sampleA_Normal.antitargetcoverage.cnn
> Correcting for GC bias... Correcting for RepeatMasker bias...
> ....local/lib/python3.6/site-packages/pandas/core/frame.py:7116:
> FutureWarning: Sorting because non-concatenation axis is not aligned.
> A future version of pandas will change to not sort by default.
> To accept the future behavior, pass 'sort=False'.
> To retain the current behavior and silence the warning, pass
> 'sort=True'.
> 
>   sort=sort, Calculating average bin coverages Calculating bin spreads
> Targets: 1162 (13.7499%) bins failed filters (log2 < -5.0, log2 > 5.0,> spread > 1.0) 
> Antitargets: 19439 (100.0000%) bins failed filters 
> Wrote my_reference.cnn  with 27890 regions 
> Running 1 samples in 8 processes (that's 8  processes per bam) 
> Running the CNVkit pipeline on SampleB_Tumor.bam
> ... Processing reads in SampleB_Tumor.bam 
> Time: 76.712 seconds > (63585 reads/sec, 110 bins/sec) 
> Summary: #bins=8451, #reads=4877720,  mean=577.1767, min=0.0, max=3535.910891089109 
> Percent reads in regions: 92.246 (of 5287755 mapped)
>  Wrote SampleB_Tumor.targetcoverage.cnn  with 8451 regions 
> Processing reads in SampleB_Tumor.bam 
> Time: 2.223  seconds (0 reads/sec, 8746 bins/sec) 
> Summary: #bins=19439, #reads=0,  mean=0.0000, min=0.0, max=0.0 
> Percent reads in regions: 0.000 (of 5287755 mapped) 
> Wrote  SampleB_Tumor.antitargetcoverage.cnn  with 19439 regions 
> Processing target: SampleB_Tumor
> Keeping 7289 of 8451 bins Correcting for GC bias... 
> Correcting for density bias...
> Processing antitarget: SampleB_Tumor
> Keeping 0 of 19439 bins
> WARNING: most bins have no or very low coverage; check that the right  BED file was used 
> Wrote SampleB_Tumor.cnr with 7289 regions 
> Segmenting SampleB_Tumor.cnr
> ... Segmenting with method 'cbs', significance threshold 0.0001, in 8  processes
> .../cnvkit/cnvlib/smoothing.py:197:
> RuntimeWarning: invalid value encountered in greater   bad_idx = (y >
> x.max()) | (y < x.min())
> .../cnvkit/cnvlib/smoothing.py:197:
> RuntimeWarning: invalid value encountered in less   bad_idx = (y >
> x.max()) | (y < x.min())

I see the RuntimeWarnings repeat several times and the command ends with no other output being produced. Has anyone else experienced this?

Any input is appreciated! My cnvkitversion is 0.9.6.dev0 and python version is 3.6.1 . Thank you.

cnvkit • 3.6k views
ADD COMMENT
1
Entering edit mode

Keeping 0 of 19439 bins WARNING: most bins have no or very low coverage; check that the right BED file was used

Have you checked this? Your command says --targets my_baits.bed; are these the same as the target regions that you are supposed to pass?

ADD REPLY
0
Entering edit mode

Hi Steve, I might be mistaken but I send in my target regions (baits) as a bed file to --targets. I see the message above for all of my tumor samples and had initially understood this as since anti target sites are not targeted sites, there is little/no coverage there. Should I be doing something different?

ADD REPLY
1
Entering edit mode

For reference, the CNVkit portion of our pipeline is here: https://github.com/NYU-Molecular-Pathology/NGS580-nf/blob/ad4b2f42efb56d0753e58c2e07c4ef6dc7fc2337/main.nf#L3861 we do some extra custom processing as well

Our targets .bed files are here: https://github.com/NYU-Molecular-Pathology/NGS580-nf/tree/ad4b2f42efb56d0753e58c2e07c4ef6dc7fc2337/targets

I did not write this portion of the pipeline so my expertise is limited, but this is what we've got.

ADD REPLY
0
Entering edit mode

Hi Steve,

Thank you for your input and advise! I could run this script using Python 2 with the same bed file without issues but have been facing problems ever since I switched over to Python3. Nonetheless, I will refer to your pipeline to better understand how I could be running cnvkit.

ADD REPLY
0
Entering edit mode

Oh well if it worked in Python 2, then you should keep it that way. Use a conda installation or container of some sort to isolate it. This is our Dockerfile for building a CNVKit container: https://github.com/NYU-Molecular-Pathology/NGS580-nf/blob/master/containers/cnvkit-0.9.0/Dockerfile ; its using the Ubuntu 16.04 default Python which should be 2.7

ADD REPLY
0
Entering edit mode

for our sequencing panel, there were two .bed files provided: probes.bed (baits?) and targets.bed. The regions contained in both are different. The latter is the one that we use for CNVKit (targets.bed). You might want to check that there is not some other .bed file related to the sequencing that you are supposed to use instead of your baits.bed file.

ADD REPLY
0
Entering edit mode

If probes.bed indicates the captured regions, I'd go with that over targets.bed for CNV calling -- not all of the requested targets may have been successfully captured by the panel kit.

ADD REPLY
0
Entering edit mode

Hello everyone, I need to call the CNV from my hunam WGS data. I searched for many tutorial, but I did'nt get any proper procedure. I am trying to call the CNV using CNVkit. Can any one tell me the procedure and in that cnvkit pipeline there is a file called bait.bed file used as (-t TARGETS) option. where will get that file. Do I need to generate on my own, if that so how can I able to generate that.

ADD REPLY
0
Entering edit mode

At least according to the documentation, it seems for WGS you can use the batch --method wgs option. There is more information here: https://cnvkit.readthedocs.io/en/stable/nonhybrid.html which also includes some extra options and notes. This will calculate the needed targets on the fly if there is not one provided - though if you only want i.e. genic regions then you can get/make a bed file that contains only the genic regions.

ADD REPLY
0
Entering edit mode

Thank you for the information.

ADD REPLY
0
Entering edit mode
5.3 years ago
Eric T. ★ 2.8k

It could be that numpy or pandas became more stringent in a recent release and so datapoints that used to pass through are now being dropped or set to NA -- just speculating.

There are some changes worth considering in the development version of CNVkit, v0.9.7.dev0 or so. (The master branch on github.) Could you try that and see how it goes?

ADD COMMENT
0
Entering edit mode

Hi Eric,

I kind of had a similar problem. When using cnvkit.py coverage sample.bam antitarget.bed -o sample.antitargetcoverage.cnn, I got summary like this: Time: 14.480 seconds (0 reads/sec, 3945 bins/sec) Summary: #bins=57119, #reads=0, mean=0.0000, min=0.0, max=0.0 Percent reads in regions: 0.000 (of 23342057 mapped)

I opened the output cnn file and the depth column are all 0. Someone had used the same command on the same bam file and bed file but got different outputs with all the depths between 0 and 1.6. Do you have any idea of what went wrong?

ADD REPLY
0
Entering edit mode
6 months ago
Anitha ▴ 10

While running with this batch option batch --method wgs option. I have used this command " cnvkit.py batch tumor.bam -n normal.bam -m wgs -f GRCh38.p14.genome.fa". I got many files as an output, in that I got the .cns files only for the tumor sample. I have some doubts regarding this. Is this only calls the somatic cnv??. Does this tool will perform germline cnv calling. If so I need steps for calling both the somatic and germline CNV.

ADD COMMENT
0
Entering edit mode

The normal samples is pooled into single reference for cohort. In this I have a doubt, here the normal sample referred as sample which took from the uninfected patient (non-cancer)?? or the sample which is took from the normal tissue (from the infected cancer patient)??. For doing cohort which sample do i need to use??

ADD REPLY

Login before adding your answer.

Traffic: 2045 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6