Hello,
I am a new cnvkit user. It is necessary that I have to use cnvkit with my WGS of a T-N pair (45x and 29x coverages). I am running them with 4 processes (each for 16GB). Now the program is still running for 2 days and I think it is is the fix step (for a whole day now) because I can see my reference.cnn. Is this case usual for you guys?
My command:
./python2.7 cnvkit.py batch Tumor.recal_sort2_dedup2.realigned2.NTrealign.bam \
--normal Normal_sort_dedup.realigned.recalibrated.NTrealign.bam \
--rlibpath /cnvkit/R-3.2.3/lib64/R/library:/cnvkit/R-3.2.3/lib64:/cnvkit/R-3.2.3/lib64/R/lib \
-t data/access-5k-mappable.hg19.bed \
--fasta hg19_chromosome.fa \
-g data/access-5k-mappable.hg19.bed --split --annotate data/refFlat.txt -p 4 \
--output-reference reference.cnn -y
My BAM header:
@HD VN:1.4 GO:none SO:coordinate
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983
@SQ SN:chr20 LN:63025520
@SQ SN:chr21 LN:48129895
@SQ SN:chr22 LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@RG ID:Clean3_L7_fix_kmer_q15_TrimN_N0_L70 PU:None LB:1 SM:T CN:hcpcg PL:ILLUMINA
@RG ID:Clean3_L8_fix_kmer_q15_TrimN_N0_L70 PU:None LB:1 SM:T CN:hcpcg PL:ILLUMINA
@PG ID:GATK PrintReads VN:3.4-46-gbc02625 CL:readGroup=null platform=null number=-1 sample_file=[] sample_name=[] simplify=false no_pg_tag=false
@PG ID:MarkDuplicatesCheers,..... @PG ID:bwa.6 VN:0.7.12-r1039 CL:./bwa mem -t 4 hg19_chromosome.fa R1.fastq.gz R2.fastq.gz -M -R @RG\tID:Clean3_L8_fix_kmer_q15_TrimN_N0_L70\tPL:ILLUMINA\tPU:None\tLB:1\tSM:T\tCN:hcpcg
Cheers,
James
Thx Etal, I try it out for your easy-to-use package and will tell you.
Cheers,
James
Hi Etal,
With the WGS data, I have seen the following error when using cnvkit.py fix:
cnvkit version
My
Clean3_mergedL7L8_150911_FR07887821_antitargetcoverage
is empty following the online maual:My
Clean3_mergedL7L8_150911_FR07887821_targetcoverage
:My command:
My error file
I searched Google to find the way out, but unfortunately it is beyond my knowledge. Could you help me sort this out? Thank you for your precious time.
PS-
Clean3_mergedL7L8_150911_FR07887821
==Tumor.bam andClean3_L8_FR07887830
==Normal.bamCheers, james
Hi James,
The problem is that the
*_antitargetcoverage.cnn
and*_targetcoverage.cnn
files need to be named*.antitargetcoverage.cnn
and*.targetcoverage.cnn
instead, i.e.Clean3_mergedL7L8_150911_FR07887821.targetcoverage.cnn
andClean3_mergedL7L8_150911_FR07887821.antitargetcoverage.cnn
.The sample ID is the filename leading up to the first "." character, so there needs to be a "." between "Clean3_mergedL7L8_150911_FR07887821" and "targetcoverage.cnn" or "antitargetcoverage.cnn" for CNVkit to recognize that the samples match. (It's hard-coded this way, sorry).
I fixed the name, ran
cnvkit.py fix
and got the following call:I read your paper that target regions have low repeats, so I filtered out simple repeats (ucsc hg19) from the access-5k-mappable.hg19.bed and used these filtered access regions as a target to generate non-zero antitarget regions. Then I started over using 'cnvkit.py batch' and the program finished successfully!!!
Can I use these results? Is this method still consistent with your methods in the paper?
Thank you very much for your time. Hope I could use these results because they seem okay.
Cheers,
James
Sorry, I found the issue and fixed it just now.
For your workaround - good idea! I think these results will still be valid. If you mapped the reads with BWA, that should handle ambiguous read alignments acceptably. In any case CNVkit will downweight or filter out low-quality copy number bins at several steps, and CBS should still do fine if the errors are random.
THX very great program indeed.
PS - WGS ran 3-4 days using
-p 12
James