I have a little problem with CNVkit.
I normally use cnvkit to calculate CNV in a whole exome panel and I have no problems and I have a good results. But now I'm trying to calculate them in a smaller panel (50 genes). The commands Ihave used are the same but with a different bed file.
cnvkit.py batch sample_sorted_ND.bam --normal *sorted_ND.bam -t my-targets.bed --fasta hg19_ref_genome.fa --access data/access-5kb-mappable.hg19.bed --output-reference my_Mreference.cnn --output-dir example1
The problem is that, when I look at the log2 in the cnr file, the only regions I have out of range are BRCA1_14 and BRCA_13, which is the deletion that there really is.
chr17 41228504 41228631 427_15619_672(BRCA1)_14 142535 -0.540179 0.629573
chr17 41231350 41231416 427_15618_672(BRCA1)_13 119045 -0.672668 0.552469
But in the diagram comes out many genes (most of the genes in the panel but not BRCA1), Why? Should not only go uot these regions? I am doing something wrong? I must add something else on the command line?
Other question is that the deph column is not correct since it does not always recognize decimals.
Some one can help me?
Thank you,
Kira
The decimal issue you're showing here is very strange. I'm thinking:
477_192314_7516-XRCC2_1
from chr7 listed in the middle of these BRCA1 targets? The other targets here don't appear to be in genomic order, either. If the .cnn or .cnr files are scrambled, then that could lead to other issues.For the decimal issue, could you look at your system's locale settings and Python and pandas versions to see if your shell environment or pandas installation is mixing up
.
versus,
thousands separators? My guess is that when the intermediate .cnn and .cnr files are being written by CNVkit via pandas, the decimal disappears because it looks like a thousands separator. But since the depth column isn't used for much after constructing the reference (check your reference.cnn file to see if the log2 values are wild there), it might not be the source of your main issue, the undetected BRCA1 deletion.In your diagram or .cnr, do the log2 ratios look well-centered (mostly near 0), or is there a lot of noise and potentially off-center log2 ratios? If centering is the problem (maybe due to lots of outlier values), you could try re-centering with
call -m median
, or look further upstream to find the source of the outliers.If it ultimately looks like a bug in CNVkit, could you try the latest from GitHub and/or tell me which version you're using?