VCF format: indel with two reference alleles
1
1
Entering edit mode
9.7 years ago

I'm struggling with making VarScan's output adhere to GATK's strict rules for VCF files. Currently, I cannot understand what is wrong with this line:

1    20571284    .    CAAAAAAA,AAAAAAAA    C    .

GATK says this cannot be parsed. Are two reference alleles not allowed, or is is supposed to be formatted otherwise?

VarScan GATK vcf • 3.0k views
ADD COMMENT
0
Entering edit mode

EDIT: OK, what is the reference base at 20571284? Is it C or A or M or N?

ADD REPLY
3
Entering edit mode
9.7 years ago

No the VCF spec says that there is only one REF allele: http://samtools.github.io/hts-specs/VCFv4.2.pdf while you can have one or more ALT bases

REF - reference base(s): Each base must be one of A,C,G,T,N (case insensitive). Multiple bases are permitted.

ALT: Comma separated list of alternate non-reference alleles called on at least one of the samples.

to solve your problem, you can duplicate your VCF line:

1    20571284    .    AAAAAAAA    C    .
1    20571284    .    CAAAAAAA    C    .

but (as said Heng below) you'll also have to check the following statement of the spec: (REF) "Strings must include the base before the event"

ADD COMMENT
2
Entering edit mode

These are two different REF alleles, which cannot be right.

ADD REPLY
0
Entering edit mode

Alright, thank you. I suspected that was the case but didn't find anything that verified it.

ADD REPLY

Login before adding your answer.

Traffic: 2062 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6