I am normalizing some GWAS summary statistics to gnomad.
gnomad has some entries like this that seem to be duplicated indels:
chr21 13405435 rs140129927 G GT . PASS AC=2962;AN=148224;AF=0.0199833;popmax=afr;faf95_popmax=0.0636127;AC_non_v2_XX=1118;AN_non_v2_XX=59420>
chr21 13405435 rs140129927 GT G . PASS AC=40946;AN=148190;AF=0.276307;popmax=amr;faf95_popmax=0.419202;AC_non_v2_XX=16812;AN_non_v2_XX=59400
I realize these might be two different measurements, but for my purposes I really only need one (having both is messing up my pipeline)
How can I drop duplicate indels (keeping one) at the same position and with the same REF/ALT alleles ? I want to keep multiallelic SNVs untouched, issue just seems to be the indels
will bcftools norm --rm-dup indels
do this? Is there anything I am missing?
followup question: how can those be the same variant with allele frequencies like that? it seem like an insertion of T and deletion of T with G as the anchor would have mirrored frequencies, not one being 0.0199833 and the other being 0.276307