Question

VCF Format: How to encode LOH regions and mosaic copy number regions

0

Entering edit mode

5.2 years ago

ed.erwin • 0

My software group has been required to write VCF files containing, among other things, regions of LOH (Loss of Heterozygosity) and copy number regions.

I would like to know how to do this in the most standard way, if there is one.

We know that copy number variants are usually described this way

##FORMAT=<ID=CN,Number=1,Type=Integer,Description="Copy number genotype for imprecise events">

But we make copy number calls that are not always integer. For example "2.5" could indicate mosaic copy number where half of the sample has CN=2 and half has CN=3. (By itself, 2.5 is ambiguous. It could actually be any mixture that averages out to 2.5 copies. But this number represents all the information we have available.)

Can we change the FORMAT definition for CN to allow Type=Float ? Or should we create a different FORMAT for CNM (CN Mosaic) ? Is there a standard FORMAT for LOH regions ?

VCF • 1.8k views

ADD COMMENT • link 5.2 years ago by ed.erwin • 0

0

Entering edit mode

More a thought then an answer: copy-number is actually integer, what matters here is the percentage of cells where this change occurred. CN2.5 is ambigous, but usually if you look at B-allele frequency you can kinda resolve this (if it is not a tumor sample where several CNAs happen at the same region). If you will move into developing your own format, I'd keep the following fields: cell fraction where variant happened, copy-number of allele 1, copy-number of allele 2.

ADD REPLY • link 5.2 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

Unfortunately we do not have access to information on cell fractions with different copy numbers. We only have the average copy number. So I'm looking for a way to encode the information that we have in VCF format. So, I'm trying to determine whether changing the format field to define "CN" as "Float" rather than integer is acceptable, or whether creating some different format field such as "CNM" is a better idea.

ADD REPLY • link 5.2 years ago by ed.erwin • 0