Hi,
I am working on a project that involves phasing SNVs and INDELs from whole genome and exome sequencing data using SHAPEIT2 software. I followed the instructions from the 1000 genome project (1000 genome guide) and I encountered a problem with multiallelic variants. The instructions say:
“To phase both biallelic and multiallelic variants we first split the multiallelics into separate rows while left-aligning and normalizing INDELs using bcftools norm tool (Li, 2011). Next, we shifted the position of multiallelic variants (2nd, 3rd, etc ALT alleles) by 1 or more bp (depending on how many ALT alleles there are at a given position) to ensure a unique start position for all variants, which is required for SHAPEIT2. We shifted the positions back to the original ones after phasing.”
I don’t understand how to do this step of shifting the position of multiallelic variants. How can I reliably identify which variants are multiallelic and how much should I shift them? Is there any tool or script that can do this automatically? What are the potential consequences of not doing this step?
I would appreciate any help or advice.
Regrads
Thank you very much!