Hi,
I've got .vcf.gz files where GATK was used for variant calling and I'd like to annotate them using SnpEff using Galaxy. I keep getting the following error:
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/corral4/main/jobs/040/966/40966912/_job_tmp -Xmx7g -Xms256m Error: Error while processing VCF entry (line 176) : chr1 16890671 . TG CA 125.238 PASS AF=0.53012;AO=44;DP=84;FAO=44;FDP=83;FDVR=10;FR=.;FRO=39;FSAF=21;FSAR=23;FSRF=20;FSRR=19;FWDB=-0.0269551;FXX=0.0119048;HRUN=1;HS_ONLY=0;LEN=2;MLLD=170.568;OALT=CA;OID=.;OMAPALT=CA;OPOS=16890671;OREF=TG;PB=0.5;PBP=1;PPD=0;QD=6.03554;RBI=0.0269856;REFB=0.0133951;REVB=-0.00128241;RO=39;SAF=21;SAR=23;SPD=0;SRF=20;SRR=19;SSEN=0;SSEP=0;SSSB=-0.0302714;STB=0.516705;STBP=0.775;TYPE=mnp;VARB=-0.00753567 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 0/1:91:84:83:39:39:44:44:0.53012:23:21:20:19:23:21:20:19 java.lang.StringIndexOutOfBoundsException: String index out of range: 3 java.lang.StringIndexOutOfBoundsException: String index out of range: 3 at java.lang.String.substring(String.java:1963) at org.snpeff.snpEffect.HgvsProtein.simplifyAminoAcidsLeft(HgvsProtein.java:395) at org.snpeff.snpEffect.HgvsProtein.simplifyAminoAcids(HgvsProtein.java:384) at org.snpeff.snpEffect.HgvsProtein.toString(HgvsProtein.java:491) at org.snpeff.snpEffect.VariantEffect.getHgvsProt(VariantEffect.java:633) at org.snpeff.vcf.VcfEffect.set(VcfEffect.java:1031) at org.snpeff.vcf.VcfEffect.<init>(VcfEffect.java:147) at org.snpeff.outputFormatter.VcfOutputFormatter.addInfo(VcfOutputFormatter.java:98) at org.snpeff.outputFormatter.VcfOutputFormatter.toString(VcfOutputFormatter.java:286) at org.snpeff.outputFormatter.OutputFormatter.endSection(OutputFormatter.java:112) at org.snpeff.outputFormatter.VcfOutputFormatter.endSection(VcfOutputFormatter.java:230) at org.snpeff.outputFormatter.OutputFormatter.printSection(OutputFormatter.java:145) at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.annotate(SnpEffCmdEff.java:292) at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.annotateVcf(SnpEffCmdEff.java:468) at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.annotate(SnpEffCmdEff.java:142) at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:1029) at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:984) at org.snpeff.SnpEff.run(SnpEff.java:1183) at org.snpeff.SnpEff.main(SnpEff.java:162)
My guess is that it has to do with the reference and alternate bases being composed of 2 bases rather than 1 base? How could I best resolve this issue?
Thank you!
I do think that something is not right about it, but I'm not sure what exactly is wrong with it / how to fix it, especially that I've got the same error for 6 different samples, just at different lines when it first encounters 2 bases like in the above example (TG and CA)
I will try bcftools norm and see how that looks, thank you.
I tried bcftools norm and didn't change anything.
I ran snpEff without choosing any options and using vcf instead of vcf.gz files and that seems to would have worked. Yet these 2 consecutive mismatches still remain...