Entering edit mode
3.7 years ago
wangh920
▴
10
Does anyone know to keep only biallelic SNP in vcf file.
I know this post how to remove multiallelic from VCF did the job, but I still have biallelic snp with three or four such as:
CHROM POS ID REF ALT
chr1.1 294553 . GGCA GGCG
Thanks a lot in advance.
Hu
Hello,
This is not a multiallelic variant, since you have a reference GGCA and only one uniq alternative allele for each position GGCG. Therefore this is actually a biallelic variant. A multiallelic would be something like:
chr1 294553 . GGCA GGCG,GGGA
chr1 294553 . G C,A
Thanks a lot for your reply, you are right, it is indeed a biallelic variant, do you know how this happen? why it can not be A (Ref), G (Alt), but result in GGCA (Ref), GGCG (Alt) in this condition.
Best regards,
Hu
Hi,
This is related to the strategies each variant caller uses. You can try to normalize your VCF after decomposing (keeping only biallelic variants). There are many tools, some of then also split multiallelic variants as well as normalize then. Bcftools norm is one, vt normalize and GATK LeftAlignAndTrimVariants are another.
Appreciated your help desouzareis.r, I learned a lot from your reply.