vcf-validator could not parse the alleles
0
0
Entering edit mode
9 months ago
ekirsch • 0

Hello,

I am trying to use the -vcf-validator tool within vcf tools.

My goal is to use plink to create ped and map files and then eventually run roh analyses on these files (also using plink). However, when I try to run this command

plink2 --vcf alaudinus.vcf.gz --make-bed --out alaudinus --allow-extra-chr

I get this error:

Error: Line 37274623 of --vcf file has fewer tokens than expected.

Because some online forums suggested that this may be due to file corruption, I try running vcf validator, I get repeating lines of this output:

JAKOOL010000001.1:138 .. Could not parse the allele(s) [*]

I thought this issue may be due to an error in the way that I subsetted my big vcf file into smaller ones (for each subspecies), but when I run the command on the big vcf file, which I have not modified at all, I get the same error. I am pretty sure that I downloaded the vcf file fully and correctly and when I do the less command to look at it, it looks fine and does not appear to have any corruption issues. My question is, does this ouput from vcf-validator definitely mean that the file is corrupted, or could it mean that the type of vcf file I am using just cannot be read by vcf-validator? I am unsure of the specifics for how vcf-validator works (ex. if it can read a vcf file with multiallelic sites).

To give some background, I am working on a vcf file for a non-model organism that is in scaffold notation.

vcf-validator vcftools vcf • 549 views
ADD COMMENT
1
Entering edit mode

How did you generate the VCFs? I suspect GATK, because I've had similar issues in the past. Some tools fail to handle the * annotation GATK provides, which in VCF v4.3 means an "allele missing due to overlapping deletion". GATK forum page on this here.

Here is an older post potentially with a similar problem. Unsure if it was resolved.

ADD REPLY
0
Entering edit mode

Yes, I used GATK.

Thank you for your comment! I did not know the * was used to indicate overlapping deletions.

Does anyone know if PLINK is one of the programs that cannot handle vcf files with this notation? If so, would it be fine to go ahead with PLINK analysis after removing all sites with the * annotation?

ADD REPLY

Login before adding your answer.

Traffic: 1776 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6