Hi all,
I'm working with a vcf (v4.1) that has incorrectly formatted deletions for some reason. The insertions are fine, but the deletions are annotated as (example):
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
2 32474671 indel.60227 A - . PASS . GT
Notice that the ALT is -
, when the line should have been formatted as such (example):
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
2 32474670 indel.60227 GA G . PASS . GT
I have no idea how the deletions ended up like this in the vcf, but my present plan is to parse a reference genome fasta file for these positions and manually correct all the deletion annotations, so I don't have to drop them from the vcf. What I wanted to know is if there's a tool that already does this- as it stands, I'm writing a manual parser.
It is quite odd that insertions are fine and deletions are not. Older VCF versions
(4.1)had.
for REF in insertions and.
for ALT in deletions, so either both should be affected or neither should be.Maybe give this tool a shot? Disclaimer: This tool is not mine and I have never used it. Maybe bcftools norm --check-ref can fix the REF alleles, I'm not sure though.
I definitely agree that it's odd. I'm having trouble finding older versions that used a
-
as ALT in deletions, so I'm not sure it's ever the case. A big part of this problem is that I can't figure out how the people who supplied the VCF ended up in this situation.bcftools doesn't seem to fix the problem, probably because the REF alleles are fine; it's the ALT that are botched.
Looking into the other tool that you linked. Hopefully it helps.
I don't know of any tool that uses
-
- older versions used.
, not-
.