Hello!
I want to index, and do some other stuff with my .vcf.gz file.
1-) I wanted to index it and got an error:
[E::vcf_parse_format] Invalid character '.' in 'GQ' FORMAT
So, I unzipped the file, changed the header, bgzipped it.
gunzip -c my_file.vcf.gz | sed 's/^##fileformat=VCFv4.1/##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">/' | bgzip > 2my_file.vcf.gz && tabix -f -p vcf 2my_file.vcf.gz
2-) Then, I wanted to extract SNPs and do some other stuff via this script:
mkdir input
bcftools norm -Ov -m-snps 2my_file.vcf.gz | bcftools norm -Ov --check-ref w -f reference.fasta > input/norm.vcf.gz
but got an error like:
Failed to open 2my_file.vcf.gz: unknown file type
and I checked the file type with htsfile and got:
2my_file.vcf.gz: SAM version 1 BGZF-compressed sequence data
So, I intended to do:
cp 2my_file.vcf.gz plain.vcf
bcftools view -Oz -o compressed.vcf.gz plain.vcf
htsfile compressed.vcf.gz plain.vcf
bcftools index compressed.vcf.gz
but it stuck on bcftools view line and throws an error like:
Failed to open plain.vcf: unknown file type
So, what else I can do? I'll be appreciated if you can share your ideas! Thanks!
PS: The reason why I used fasta file with bcftools norm is it also throws an error with sample names, so I found a solution like this in case you wonder.
Hey!
Thanks, the first solution worked but now this:
Error at scaffold95:5127, the tag GP has wrong number of fields
Here is the line in interest from the header:
Do you have any idea about this too?
PS: The second solution gave this error in case you wonder:
The error you're seeing is due to the content of that tag on a particular line, so if you find the offending line you'll be able to trace the error. The definition of the GP tag in the header seems fine, although I'm not used to that tag and it could be a problem if you define it as
integer
while the VCF format documentation describes it asfloat
.Regarding the second solution, it just had that extra
^
at the beginning of the line in the second section of the sed substitution by mistake. I've corrected it.Nailed it like:
I guess all issues related to "the wrong number of fields" can be fixed like changing Number=what_ever_it_is to Number=.