I'm trying to work with Canvas to find Copy Number Variants in human data. I would appreciate if someone clarified what this input is supposed to be:
--b-allele-vcf=VALUE vcf containing SNV b-allele sites (only sites
with PASS in the filter column will be used)
(required)
I have called and filtered SNPs for my samples. Is this asking me to provide the set of SNPs (or SNP sites) that are flagged as having the alternate allele in the VCF file? If so, couldn't I just grep AB=1
and be good?
See this manual for details:
http://biorxiv.org/content/biorxiv/suppl/2016/01/13/036194.DC2/036194-2.pdf
"Canvas supports a number of different workflows depending on the input sequencing data. The available modes are:
Germline
WGS: CNV calling of a diploid germline sample from whole genome sequen cing data
Somatic
Enrichment: CNV calling of a somatic sample from targeted sequencing data
Somatic
WGS: CNV calling of a somatic sample from whole genome sequencing data
Tumor
normal
enrichment: CNV calling of a tumor/normal pair from targeted sequencing data"
Thank you, Natasha! Why didn't they just say "heterozygous sites" from the beginning?? I guess what I have to do then, for the purpose of this input file, is to grep for 0/1 or 1/0 SNPs.