Question

Mutect output for CNV calling

0

Entering edit mode

4.8 years ago

vctrm67 ▴ 80

I am a bit confused as to why CNV callers require Mutect output for CNV calling. Callers such as CNVkit specify that it has something to do with allelic imbalance but I'm not completely sure. Could someone elaborate?

cnv • 1.9k views

ADD COMMENT • link updated 4.7 years ago by Biostar 20 • written 4.8 years ago by vctrm67 ▴ 80

0

Entering edit mode

You need MuTect output to call the region of LOH. Further information is in this post What does BAF mean?

ADD REPLY • link 4.8 years ago by Masood ▴ 20

0

Entering edit mode

Ah. Would this be the raw, unfiltered output of Mutect? Ie. Mutect labels calls as either passing its filters, germline risk, or being found in panels, etc. Which set of SNVs is required for CNV calling?

ADD REPLY • link 4.8 years ago by vctrm67 ▴ 80

0

Entering edit mode

Depends on the tool you want to use. The standard is snvs that were heterozygous in germline.

ADD REPLY • link 4.8 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

So for CNVkit, I see this message:


When a VCF file containing SNV calls for the same tumor sample (and optionally a matched normal) is given using the -v/--vcf option, the b-allele frequencies (BAFs) of the heterozygous, non-somatic SNVs falling within each segment are mirrored, averaged, and listed in the output .cns file as an additional “baf” column (using the same logic as export nexus-ogt). If --purity was specified, then the BAF values are also rescaled.

If a VCF file is given using the -v/--vcf option, then for each segment containing SNVs in the VCF, an average b-allele frequency (BAF) within that segment is calculated, and output in the “baf” column. Allele-specific integer copy number values are then inferred from the total copy number and BAF, and output in columns “cn1” and “cn2”. This calculation uses the same method as PSCBS: total copy number is multiplied by the BAF, and rounded to the nearest integer.

Allelic imbalance, including copy-number-neutral loss of heterozygosity (LOH), is then apparent when a segment’s “cn1” and “cn2” fields have different values.

I assume by "b-allele frequencies (BAFs) of the heterozygous, non-somatic SNVs" they require germline SNVs? Why not use GATK's HaplotypeCaller on a normal sample and use the output from that then?

ADD REPLY • link 4.8 years ago by vctrm67 ▴ 80

0

Entering edit mode

Yes, you get it right :) and yes you can use gatk caller to call variants from normal sample, but what you are actually interested in is how BAFs of germline variants changed in tumor - so you need to "call" these variants in tumor and compare with what you've seen in normal

https://cnvkit.readthedocs.io/en/stable/baf.html

ADD REPLY • link 4.8 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

Yes, the unfiltered ones. Cnvkit will filter variants with somatic tag in mutect2 calls.

ADD REPLY • link 4.8 years ago by Masood ▴ 20

0

Entering edit mode

I just ran CNVkit with and without a vcf input, and the calls look the same apart from a few extra columns of information in the calls with the vcf. Is there a reason? I don't know too much about it but I would have thought that the vcf itself would change some of the calls.

ADD REPLY • link 4.8 years ago by vctrm67 ▴ 80

0

Entering edit mode

Basically you are looking for regions of LOH and allelic imbalance using the snp information. The different values in the extra columns of cn1 and cn2 means these are regions of LOH or allelic imbalance. So there will be three types of strcutural variants: copy number gain loh, copy number loss loh and copy number neutral loh.

ADD REPLY • link 4.8 years ago by Masood ▴ 20