This might be a niche question, but here it goes anyway.
EDIT: Like DBScan mentions below, this is an issue where polyploidic cells come into play, is has nothing to do with multi-allelic variants.
Original question:
I know how bcftools norm
and vt decompose
both decompose/split multi-allelic variants to biallelic records. While they split multiple ALTs, they don't do justice to the GT fields of records where n(ALT)>2.
For example, a triallelic variant gets the following possible GTs: ././.
, 0/./.
, ./0/.
, 0/1/.
, and 0/./1
. The last two are HETs while all others are HOM-REFs (theoretically at least). Also, not all GTs are triallelic like the above. Some are biallelic, which is difficult (for me) to interpret. Ideally, splitting multi-allelics to biallelics should also pick corresponding GTs so all GTs are biallelic as well.
Does anyone know how to make this happen? If not, is there any way to pick the HETs and HOM-ALTs from the HOM-REFs? As of now, the only solution I can think of is to run bcftools +missing2ref, then import the GTs into R/some sort of programming environment and calculate sum(GT split by /), set 0 = HOM-REF and 1 = HET.
Aren't your multiallelic GTs already diallelic? The only case I can think of to have a genotype like
././.
is having a triploid species.Example of a triallelic site:
You're absolutely right - I lost sight of that.
I'm working with cancer cells, so polyploidy is a possibility. Is there any way to address that?