I am at the step of deriving absolute integer copy number for each segment and the documentation states that one can pass in a vcf file of SNPs in the tumour samples:
cnvkit.py call Sample.cns -y -v Sample.vcf -o Sample.call.cns
This should extract b-allele frequencies and allow for the calculation of major and minor copy number. I am having trouble finding the exact format of VCF cnvkit needs in order for this work.
I've called SNPs using bcftools. Specifically, in the tumor and normal separately and then intersecting positions found in both (using bcftools isec). Then passed the vcf file of this format:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SampleA
1 926351 . C T 11.1 . DP=2;VDB=0.0106;AF1=1;AC1=2;DP4=0,0,1,1;MQ=60;FQ=-33 GT:PL:GQ 1/1:42,6,0:9
1 1474167 . A G 8.65 . DP=1;AF1=1;AC1=2;DP4=0,0,1,0;MQ=60;FQ=-30 GT:PL:GQ 1/1:38,3,0:5
When running I get an error like this:
Skipping 1:926351 C; unsure how to get alternative allele count: CallData(GT=1/1, PL=[42, 6, 0], GQ=9)
Skipping 1:1474167 A; unsure how to get alternative allele count: CallData(GT=1/1, PL=[38, 3, 0], GQ=5)
Seems like it doesn't know how to extract the relevant pieces of information from the VCF file. Does cnvkit accept vcf output from a separate SNP calling tool?
Thanks,
Thanks for the reply.
Based on the VCF header produced by the bcftools:
So it would appear that the allele information is in the DP4 field and it is comma separated. This is direct output of bcftools. I'll try to use another germline mutation caller that outputs the allele data into AD.