Entering edit mode
3.8 years ago
Volka
▴
180
Hi all, I am looking at some data in my VCF file and came across this line below:
20 62855516 20:62855516:GC:AC GC AC . . PR;AC=8;AN=70 GT 0/0 0/0 0/0 0/0 0/1 0/1 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/1 0/0 0/0 0/0 0/0 0/1 0/1 0/1 0/0 0/0 0/0 0/1 0/0 0/0 0/0 0/0 0/0 0/0
My question is, is this considered a multiallelic site? How should I handle this entry? I am also looking to compare sites with another VCF, and the equivalent position in the other VCF has G in REF and A in ALT, is there a way to clean the data to consider only the first allele for this entry?
I've tried to remove/fix these entries with bcftools view -m2 -M2 -v snps and bcftools norm -m -any but it doesn't seem to catch it.
Thanks.
Instead of removing, try decomposing vcfs with Vt. Refer to the
decompose biallelic block substitutions
section here: https://genome.sph.umich.edu/wiki/Vt