One answer is that managing multi-allelic calls is difficult from an analysis perspective. In an association test, like in GWAS, as each site (variant) is analysed independently, it makes no difference to split a multi-allelic call into 2 separate bi-allelic calls, and test each separately.
Also, in a germline sample, it makes little sense, biologically, that a multi-allelic call would even be present, unless our VCF contains data from more than 1 individual. On the other hand, in a cancer context, considering a bulk tumour biopsy sample, we would expect many multi-allelic calls.
You will undoubtedly find many more opinions online via a search.
Kevin
Edit: to add information based on chrchang's response, we can tolerate multi-allelic calls depending on how we code them. Consider this call:
Ref: A
Var: T,G
Var allele counts: 56,2
We can potentially sum up the total allele count for the variants and regard it as a bi-allelic site, meaning a total of 58, or split it into 2 calls for T (56) and G (2).
Note that splitting a multi-allelic call does change
plink2 --glm
,plink2 --pca
,plink2 --hwe
, and quite a few other results. All should be more accurate if the call is not split. (With that said, sometimes you have no choice: e.g. if downstream pipeline steps can't handle multiallelic--glm
output, go ahead and split first.)Thanks for the additional information, chrchang! I had figured that this was a question with no single answer