I run variantsToTable module of gatk as follows:
gatk VariantsToTable -V 001.vcf -F CHROM -F POS -F ID -F REF -F ALT -F QUAL -F FILTER -F INFO -F FORMAT -F Father -F Mother -F Child -F ADP -F STATUS -F CSQ -GF GT -GF GQ -GF SDP -GF DP -GF RD -GF AD -GF FREQ -GF PVAL -GF RBQ -GF ABQ -GF RDF -GF RDR -GF ADF -GF ADR -O 001.table
It has shown an error as follows:
htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 92283: Duplicate allele added to VariantContext: C, for input source: 001.vcf
I checked what is there in the line number
sed -n 92283p 001.vcf
the output was:
chr1 35816798 . CAAAAAAAAAAAAA C,C . PASS ADP=20;STATUS=2;CSQ=-|intron_variant|MODIFIER|AGO4|ENSG00000134698|Transcript|ENST00000373210|protein_coding||1/17||||||||||1||HGNC|HGNC:18424|YES|CCDS397.1|| GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:51:22:22:7:13:59.09%:6.4422E-6:38:47:6:1:13:0 0/2:20:27:27:13:6:24%:9.828E-3:38:39:13:0:6:0 0/2:22:13:13:5:6:46.15%:6.192E-3:41:38:5:0:6:0
To reformat the vcf file I run
awk -F '\t' '/^#/ {print;next;} {OFS="\t";R=$4;n=split($5,a,/[,]/);s="";for(i=1;i<=n;i++) {s=sprintf("%s%s%s%s",s,(i==1?"":","),a[i],a[i]==R?"AAAAAAAAA":"");} $5=s; print;}' 001.vcf > 001new.vcf
Again run VariantsToTable
on 001new.vcf
But, still it shows the same error.
I removed the particular line from vcf, error has changed to another line. The think which I noticed is REF
column is same allele. here C,C
Any help appreciated
I tried this, still I get the same error