Malformed VCF file at approximately line number 101449: The VCF specification does not allow for whitespace in the INFO file
2
0
Entering edit mode
7.6 years ago
bruseq ▴ 40

Hello

I am using BisSNP-0.82.2 tool to find SNP in bisulfite data. But I get the below mention error while using this command : java -Xmx4g -jar BisSNP-0.82.2.jar -R /home/genomics2/ALM_BS_genome_new.fasta -T BisulfiteGenotyper -I /home/genomics2/tools/ALM1Fd_R1_paired_trimmed_withRG.bam -D /home/genomics2/tools/dbSNP/merge_vcf_file2.vcf -vfn1 /home/genomics2/tool/dbSNP/cpg.raw.vcf -vfn2 /home/genomics2/tool/dbSNP/snp.raw.vcf -L /home/genomics2/tools/ALM1_R1_paired_vcf_lenght.bed

ERROR MESSAGE: The provided VCF file is malformed at approximately line number 101449: The VCF specification does not allow for whitespace in the INFO field.

Please guide me how to sort this error.

Thanks..

vcf gatk malformedvcf bissnp • 5.2k views
ADD COMMENT
0
Entering edit mode

Thank you so much both of you for your valuable help to sort this error. I remove all the present whitespace in my file in info column. but when I execute this below command , it generate an empty output files of cpg.vcf and snp.vcf.

Command used:

java -Xmx4g -jar BisSNP-0.82.2.jar -R /home/genomics2/tools/For_METHGO/ALM_BS_genome_new.fasta -T BisulfiteGenotyper -I /home/genomics2/tools/For_METHGO/ALM1Fd_R1_paired_trimmed_withRG.bam -D /home/genomics2/tools/For_METHGO/dbSNP/merge_new_dbSNP21.vcf -vfn1 /home/genomics2/tools/For_METHGO/dbSNP/cpg1.raw.vcf -vfn2 /home/genomics2/tools/For_METHGO/dbSNP/snp1.raw.vcf -L /home/genomics2/tools/For_METHGO/ALM1_R1_paired_vcf_lenght.bed

Is there any suggestion how I sort this error.

Thanks in advance.

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLY
4
Entering edit mode
7.6 years ago

Philipp Bayer's solution is good but since the error says "approximately at line number ...", the offending line maybe somewhere else. Also there might be other buggy lines. This awk script will print lines where the 8th field contains a space and it will print the line number as well:

awk -v FS='\t' '$8 ~ " " {print $0, NR}' snp.raw.vcf
ADD COMMENT
3
Entering edit mode
7.6 years ago

Run this command:

head -101449 /home/genomics2/tool/dbSNP/snp.raw.vcf | tail -1

To see the line it complains about, then see what the error is (something borked in the INFO field?) then fix it.

ADD COMMENT

Login before adding your answer.

Traffic: 1649 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6