Hey, I'm fairly new to all of this, so sorry in advance if I miss something that should be obvious. I am trying to annotate some VCF files with information from various databases. All of the input files seem valid because they don't raise any flags when I check them with SnpSift's vcfcheck. But I can't seem to run the output files through anything, and when I run the output files through vcfcheck, I get this error:
VcfFileIterator.parseVcfLine(114): Fatal error reading file 'GE_annotated.vcf' (line: 1):
??# # f I l e f o r m a t = V C F v 4 . 0
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Impropper VCF entry: Not enough fields (missing tab separators?).
??# # f I l e f o r m a t = V C F v 4 . 0
at ca.mcgill.mcb.pcingola.fileIterator.VcfFileIterator.parseVcfLine(VcfFileIterator.java:115)
at ca.mcgill.mcb.pcingola.fileIterator.VcfFileIterator.readNext(VcfFileIterator.java:166)
at ca.mcgill.mcb.pcingola.fileIterator.VcfFileIterator.readNext(VcfFileIterator.java:56)
at ca.mcgill.mcb.pcingola.fileIterator.FileIterator.hasNext(FileIterator.java:84)
at ca.mcgill.mcb.pcingola.fileIterator.MarkerFileIterator.hasNext(MarkerFileIterator.java:62)
at ca.mcgill.mcb.pcingola.snpSift.SnpSiftCmdVcfCheck.check(SnpSiftCmdVcfCheck.java:36)
at ca.mcgill.mcb.pcingola.snpSift.SnpSiftCmdVcfCheck.run(SnpSiftCmdVcfCheck.java:57)
at ca.mcgill.mcb.pcingola.snpSift.SnpSift.run(SnpSift.java:335)
at ca.mcgill.mcb.pcingola.snpSift.SnpSift.main(SnpSift.java:70)
Caused by: java.lang.RuntimeException: Impropper VCF entry: Not enough fields (missing tab separators?).
??# # f I l e f o r m a t = V C F v 4 . 0
at ca.mcgill.mcb.pcingola.vcf.VcfEntry.parse(VcfEntry.java:850)
at ca.mcgill.mcb.pcingola.vcf.VcfEntry.<init>(VcfEntry.java:124)
at ca.mcgill.mcb.pcingola.fileIterator.VcfFileIterator.parseVcfLine(VcfFileIterator.java:112)
... 8 more
I have gotten the similar errors using different vcf's and different databases (so I doubt that a particular input file is causing the problem). I made sure to double-check the number of columns, and it seems to be correct, so I am not sure what is wrong. The line it is complaining about is also in the input file, so I don't know why it seems to be causing an error now. I'm not sure if the semicolons that SnpSift put into the file to separate different entries in the same column might be causing problems, or if there's something else going on. Any help would be much appreciated.
Are you sure it accepts VCF version 4.0? This format seems pretty outdated.
I don't know. I am having trouble finding documentation for more recent formats beyond 4.2. Would the files from 1000 Genomes be outdated?