I want to anntate SNP in vcf format generated by GATK. Reference fasta and gff files were downloaded from http://www.phytozome.net/, Citrus clementina.
Make directory data/cle
and data/genomes
,
bash
data
├── cle
│ ├── genes.gff
└── genomes
└── cle.fa
Adding reference information in snpEFF.config:
#Cclementina, version Citrus clementina
cle.genome : Cclementina
Build database manually by running:
bash
java -jar snpEff.jar build -gff3 -v cle
After that run:
bash
java -Xmx4g -jar snpEff.jar -ud 1000 cle file.vcf > new.file.vcf
then there comes an error:
Error: Error while processing VCF entry (line 1514) :
scaffold_1 68760 . CAT C 37.73 QD AC=1;AF=0.500;AN=2;BaseQRankSum=-0.852;ClippingRankSum=-1.445;DP=31;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=50.00;MQRankSum=0.556;QD=1.51;ReadPosRankSum=0.630;SOR=0.465;ANN=C|downstream_gene_variant|MODIFIER|Ciclev10009318m.g|Ciclev10009318m.g.v1.0|transcript|Ciclev10009781m.v1.0|protein_coding||c.1407_1408delAT|||||122|WARNING_TRANSCRIPT_NO_START_CODON,C|intron_variant|MODIFIER|Ciclev10009318m.g|Ciclev10009318m.g.v1.0|transcript|Ciclev10009318m.v1.0|protein_coding|4/6|c.481+28_481+29delAT||||||,C|intron_variant|MODIFIER|Ciclev10009318m.g|Ciclev10009318m.g.v1.0|transcript|Ciclev10009356m.v1.0|protein_coding|4/6|c.481+28_481+29delAT||||||,C|intron_variant|MODIFIER|Ciclev10009318m.g|Ciclev10009318m.g.v1.0|transcript|Ciclev10009638m.v1.0|protein_coding|1/3|c.0-411_0-410delAT||||||WARNING_TRANSCRIPT_NO_START_CODON,C|intron_variant|MODIFIER|Ciclev10009318m.g|Ciclev10009318m.g.v1.0|transcript|Ciclev10009675m.v1.0|protein_coding|1/3|c.0-23_0-22delAT||||||WARNING_TRANSCRIPT_NO_START_CODON,C|intron_variant|MODIFIER|Ciclev10009318m.g|Ciclev10009318m.g.v1.0|transcript|Ciclev10009637m.v1.0|protein_coding|1/3|c.0-411_0-410delAT||||||WARNING_TRANSCRIPT_NO_START_CODON,C|intron_variant|MODIFIER|Ciclev10009318m.g|Ciclev10009318m.g.v1.0|transcript|Ciclev10009674m.v1.0|protein_coding|1/3|c.0-23_0-22delAT||||||WARNING_TRANSCRIPT_NO_START_CODON,C|intron_variant|MODIFIER|Ciclev10009318m.g|Ciclev10009318m.g.v1.0|transcript|Ciclev10009683m.v1.0|protein_coding|1/2|c.0-672_0-671delAT||||||WARNING_TRANSCRIPT_NO_START_CODON GT:AD:DP:GQ:PL 0/1:21,4:25:75:75,0,910 java.lang.RuntimeException: Interval error: end before start.
Class : Marker
Start : 68241
End : 67353
ID :
Parent class : Chromosome
Parent : scaffold_1 0-28940637 CHROMOSOME 'scaffold_1'
java.lang.RuntimeException: Interval error: end before start.
Class : Marker
Start : 68241
End : 67353
ID :
Parent class : Chromosome
Parent : scaffold_1 0-28940637 CHROMOSOME 'scaffold_1'
at ca.mcgill.mcb.pcingola.interval.Interval.<init>(Interval.java:38)
at ca.mcgill.mcb.pcingola.interval.Marker.<init>(Marker.java:37)
at ca.mcgill.mcb.pcingola.snpEffect.LossOfFunction.isLofDeletion(LossOfFunction.java:226)
at ca.mcgill.mcb.pcingola.snpEffect.LossOfFunction.isLof(LossOfFunction.java:155)
at ca.mcgill.mcb.pcingola.snpEffect.LossOfFunction.isLof(LossOfFunction.java:115)
at ca.mcgill.mcb.pcingola.outputFormatter.VcfOutputFormatter.addInfo(VcfOutputFormatter.java:170)
at ca.mcgill.mcb.pcingola.outputFormatter.VcfOutputFormatter.toString(VcfOutputFormatter.java:285)
at ca.mcgill.mcb.pcingola.outputFormatter.OutputFormatter.endSection(OutputFormatter.java:112)
at ca.mcgill.mcb.pcingola.outputFormatter.VcfOutputFormatter.endSection(VcfOutputFormatter.java:229)
at ca.mcgill.mcb.pcingola.outputFormatter.OutputFormatter.printSection(OutputFormatter.java:145)
at ca.mcgill.mcb.pcingola.snpEffect.commandLine.SnpEffCmdEff.annotate(SnpEffCmdEff.java:278)
at ca.mcgill.mcb.pcingola.snpEffect.commandLine.SnpEffCmdEff.annotateVcf(SnpEffCmdEff.java:446)
at ca.mcgill.mcb.pcingola.snpEffect.commandLine.SnpEffCmdEff.annotate(SnpEffCmdEff.java:138)
at ca.mcgill.mcb.pcingola.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:984)
at ca.mcgill.mcb.pcingola.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:939)
at ca.mcgill.mcb.pcingola.snpEffect.commandLine.SnpEff.run(SnpEff.java:978)
at ca.mcgill.mcb.pcingola.snpEffect.commandLine.SnpEff.main(SnpEff.java:136)
Line 1514 of vcf is :
scaffold_1 68760 . CAT C 37.73 QD AC=1;AF=0.500;AN=2;BaseQRankSum=-0.852;ClippingRankSum=-1.445;DP=31;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=50.00;MQRankSum=0.556;QD=1.51;ReadPosRankSum=0.630;SOR=0.465 GT:AD:DP:GQ:PL 0/1:21,4:25:75:75,0,910
The snpEFF version i use is SnpEff 4.2. Any help will be appreciated.
The gff seems ok, I fixed this error by using version 4.1l. It's a strange error.