Entering edit mode
2.4 years ago
biostars
▴
10
I downloaded 13G VCF files for a set of chimpanzees, from here:
https://eichlerlab.gs.washington.edu/greatape/data/VCFs/SNPs/Pan_troglodytes.vcf.gz
I am trying to index this vcf with
bcftools index -f Pan_troglodytes.vcf.gz
But i get the error:
[E::bgzf_read_block] Invalid BGZF header at offset 12889964510
index: failed to create index for "Pan_troglodytes.vcf.gz"
If I try without the -f flag (bcftools index Pan_troglodytes.vcf.gz) then the error i get is:
index: the input is probably truncated, use -f to index anyway. I deleted the file and re downloaded, but same problem.
The other vcf files I download from this site, eg https://eichlerlab.gs.washington.edu/greatape/data/VCFs/SNPs/Gorilla.vcf.gz work absolutely fine.
Does anyone know what is causing this error, and how to solve it?
Versions: bcftools 1.14-48-g58f886f Using htslib 1.14-22-g3f7e13e
test the file is not corrupted:
this command produces
gzip: Pan_troglodytes.vcf.gz: decompression OK, trailing garbage ignored
it is not guaranteed that the file you downloaded is bgzipped, it may be plain gzipped which would not be indexed properly. you can try
gunzip -c Pan_troglodytes.vcf.gz | bgzip -c > Pan_troglodytes.vcf.bgz; bcftools index -f Pan_troglodytes.vcf.bgz;
actually I checked with
file Pan_troglodytes.vcf.gz
and it saidBlocked GNU Zip Format (BGZF; gzip compatible)
so unsure, it could be legitimately corrupt but the above command did seem to help somewhat but it's hard to verify the integrity of the data (the gunzip there was trailing garbage ignored)