Dear all,
I split my tetraploid genome into chromosomes and chromosome short arm and long arm in order to do variant calling in parallel. Now I am doing some filtering steps and all the chromosomes worked fine, except one chr7B_long_arm and is giving me the following error:
[W::vcf_parse] Contig '��f$�h��
���eԎ���H���`ݶ
f{�Fo�Y����@00uMb�z-��I$&�gf���7Ӵ�u|'K.�oP' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'P���F�.��o��9B<~.' is not defined in the header. (Quick workaround: index the file with tabix.)
[E::vcf_parse] Could not add dummy header for contig 'P���F�.��o��9B<~.'
[W::vcf_parse] Contig 'P���F�.��o��9B<~.' is not defined in the header. (Quick workaround: index the file with tabix.)
[E::vcf_parse] Could not add dummy header for contig 'P���F�.��o��9B<~.'
[W::vcf_parse] Contig 'P���F�.��o��9B<~.' is not defined in the header. (Quick workaround: index the file with tabix.)
[E::vcf_parse] Could not add dummy header for contig 'P���F�.��o��9B<~.'
If I look at vcf file I dont find any error... How can I find the error above? I do not understand what is the problem. And the worst thing is I do not know how to find this error in the vcf file to see what is going wrong. Since all the other files worked just fine. These are the chromosomes names:
##contig=<ID=chr1A,length=585266722>
##contig=<ID=chr1B,length=681112512>
##contig=<ID=chr2A,length=775448786>
##contig=<ID=chr2B,length=790338525>
##contig=<ID=chr3A,length=746673839>
##contig=<ID=chr3B,length=836514780>
##contig=<ID=chr4A,length=736872137>
##contig=<ID=chr4B,length=676292951>
##contig=<ID=chr5A,length=669155517>
##contig=<ID=chr5B,length=701372996>
##contig=<ID=chr6A,length=615672275>
##contig=<ID=chr6B,length=698614761>
##contig=<ID=chr7A,length=728031845>
##contig=<ID=chr7B,length=722970987>
##contig=<ID=chrUn,length=498719471>
This is the command line:
rule filter_f1:
input:
donevcf="freeb/{chr}.flanking.vcf"
output:
f1=temp("freeb/{chr}.flanking.f1.vcf")
shell:
"""
/Tools/bcftools/bcftools view --types snps -m2 -M2 -q 0.01:minor {input.donevcf} > {output.f1}
"""
and file my.vcf
output:
my.vcf: Variant Call Format (VCF) version 4.2, ASCII text, with very long lines
This is also giving me an error when I try to create a vcf.gz
file with bgzip
and index it with tabix
:
[E::get_intv] Failed to parse TBX_VCF, was wrong -p [type] used?
The offending line was: "P���F�.��o��9B<~."
what was the command line ? what is the output of
file the.vcf
? What are the names of the chromosomes in the reference ?I updated the question including all your doubts.
thank you.
what is the output of
grep '^chr7B_long_arm' the.vcf' | file -
and please, show us a line of
grep -m1 '^chr7B_long_arm' the.vcf
i don't see chr7B_long_arm here
actually cht7B_long_arm is the name of my vcf file that does not work...
I solved the problem by going back to the original bgzipped file anmd decompressed it again, now it works... so probably something went wrong when I decompressed the files the first fime.