Hello, I have a VCF file containing ~900K SNPs. There are some tri-allelic SNPs in the dataset. So, I used bcftools to split those tri-allelic SNPs into biallelic SNPs. Then I tried to remove the duplicated SNP IDs that were generated after splitting the tri-allelic SNPs. I used bcftools as well but the code is not removing the Duplicated SNP IDs. I used the following command for splitting the tri-allelic SNPs:
bcftools norm -m-any --output output.vcf input.vcf
And for removing the duplicate SNPs, I used the following code:
bcftools norm --remove-duplicates --output output.vcf input.vcf
Can anyone please suggest to resolve this issue?
I believe you have to tell
bcftools
to remove snps inside the argument. Can you plz try this:bcftools norm --rm-dup snps --output output.vcf input.vcf
Just made a test here and it worked. I'm usingbcftools V 1.10.2
I have tried this withs snps/all. But still showing the same output.