Hello,
I am trying to use cnvkit with paired-WGS data and got an error with making normal reference.
1) $ cnvkit.py access hg19.fa -o access.hg19.bed
2) $ cnvkit.py autobin GroupA_Normal.bam -m wgs -g access.hg19.bed --annotate refFlat_hg19.txt
$ cnvkit.py autobin GroupA_Tumor.bam -m wgs -g access.hg19.bed --annotate refFlat_hg19.txt
$ cnvkit.py autobin GroupB_Normal.bam -m wgs -g access.hg19.bed --annotate refFlat_hg19.txt
$ cnvkit.py autobin GroupB_Tumor.bam -m wgs -g access.hg19.bed --annotate refFlat_hg19.txt
3) $ cnvkit.py coverage -p 5 GroupA_Normal.bam GroupA_Normal.target.bed -o GroupA_Normal.targetcoverage.cnn
$ cnvkit.py coverage -p 5 GroupA_Tumor.bam GroupA_Tumor.target.bed -o GroupA_Tumor.targetcoverage.cnn
$ cnvkit.py coverage -p 5 GroupB_Normal.bam GroupB_Normal.target.bed -o GroupB_Normal.targetcoverage.cnn
$ cnvkit.py coverage -p 5 GroupB_Tumor.bam GroupB_Tumor.target.bed -o GroupB_Tumor.targetcoverage.cnn
+Also we did 'antitarget' as the same manner.
4) Error occurred at this step. cnvkit.py reference Group*Normal.{,anti}targetcoverage.cnn --fasta hg19.fa -o my_reference.cnn
..... Correcting for GC bias...
Correcting for density bias...
Loading target GroupB_Normal.targetcoverage.cnn
Traceback (most recent call last): File "/root/anaconda3/bin/cnvkit.py", line 13, in <module> args.func(args) File "/root/anaconda3/lib/python3.6/site-packages/cnvlib/commands.py", line 518, in _cmd_reference args.do_rmask) File "/root/anaconda3/lib/python3.6/site-packages/cnvlib/reference.py", line 55, in do_reference do_gc, do_edge, False) File "/root/anaconda3/lib/python3.6/site-packages/cnvlib/reference.py", line 237, in combine_probes % (fname, filenames[0]))
RuntimeError: GroupB_Normal.targetcoverage.cnn bins do not match those in GroupA_Normal.targetcoverage.cnn
Please let me know if you know of any advice or suggestions.
Thank you in advance.
For the case-control sample, it is right to store all the samples in specific folder and run 'batch' according to the manual on the homepage.
However, I can not afford to have a sufficient HDD usage, so first I use 'access' function as a ref genome and calculate coverage by using only normal samples.
It looks and works well but is it alright without autobin step? Thank you in advance.
1) cnvkit.py access hg38.fa -o access.hg38.bed
2-1) cnvkit.py coverage -p 4 GroupA_Normal.bam access.hg38.bed -o GroupA_Normal.targetcoverage.cnn
2-2) cnvkit.py coverage -p 4 GroupB_Normal.bam access.hg38.bed -o GroupB_Normal.targetcoverage.cnn
2-3 and 2-4) Calculate coverage of tumor samples using 'access.hg38.bed'
3) cnvkit.py reference Group*Normal.targetcoverage.cnn -f hg38.fa -o pooled-normal_reference.cnn
4) cnvkit.py fix GroupA_Tumor.targetcoverage.cnn pooled-normal_reference.cnn -o GroupA_Tumor.cnr cnvkit.py
5) cnvkit.py segment -m hmm GroupA_Tumor.cnr -o GroupA_Tumor.cns
'Likewise GroupB_tumor'
I suggest adding the
target
command with a bin size of 5000 after step 1, then use the resulting BED file as input to thecoverage
command. Otherwise the bins will be huge.In the
batch
command doesautobin
only run on median file of tumor samples or median file of all tumor and normal samples?Only the normal samples.