How to keep only Biallelic SNP in vcf
1
0
Entering edit mode
3.7 years ago
wangh920 ▴ 10

Does anyone know to keep only biallelic SNP in vcf file.

I know this post how to remove multiallelic from VCF did the job, but I still have biallelic snp with three or four such as:

CHROM POS ID REF ALT

chr1.1 294553 . GGCA GGCG

Thanks a lot in advance.

Hu

snp • 6.9k views
ADD COMMENT
0
Entering edit mode

Hello,

This is not a multiallelic variant, since you have a reference GGCA and only one uniq alternative allele for each position GGCG. Therefore this is actually a biallelic variant. A multiallelic would be something like:

chr1 294553 . GGCA GGCG,GGGA

chr1 294553 . G C,A

ADD REPLY
0
Entering edit mode

Thanks a lot for your reply, you are right, it is indeed a biallelic variant, do you know how this happen? why it can not be A (Ref), G (Alt), but result in GGCA (Ref), GGCG (Alt) in this condition.

Best regards,

Hu

ADD REPLY
0
Entering edit mode

Hi,

This is related to the strategies each variant caller uses. You can try to normalize your VCF after decomposing (keeping only biallelic variants). There are many tools, some of then also split multiallelic variants as well as normalize then. Bcftools norm is one, vt normalize and GATK LeftAlignAndTrimVariants are another.

ADD REPLY
0
Entering edit mode

Appreciated your help desouzareis.r, I learned a lot from your reply.

ADD REPLY
0
Entering edit mode
3.7 years ago

because it's a MNP not an indel. try

bcftools view --types snps -m 2 -M 2 input.vcf
ADD COMMENT
0
Entering edit mode

Thank you Pierre for your kind reply, I tried your code, but the biallelic variant is still there. (It is actually a biallelic variant, not a MNP).

Best regards,

Hu

ADD REPLY

Login before adding your answer.

Traffic: 2482 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6