Hi,
I am trying to extract SNPs using hisat2_extract_snps_haplotypes_VCF.py script from Ensembl VCF file (ftp://ftp.ensembl.org/pub/release-87/variation/vcf/danio_rerio/Danio_rerio.vcf.gz) to build a zebrafish index for GRCz10_GCA_000002035.3 assembly (ftp://ftp.ensembl.org/pub/release-87/fasta/danio_rerio/dna/Danio_rerio.GRCz10.dna.toplevel.fa.gz).
I run: hisat2_extract_snps_haplotypes_VCF.py -v Danio_rerio.GRCz10.dna.toplevel.fa Danio_rerio.vcf Danio_rerio.GRCz10.87
and I got the following issue. The script run some time and STOP. I got the following error.
Traceback (most recent call last):
File "/Users/XRIS/bin/hisat2_extract_snps_haplotypes_VCF.py", line 892, in
args.verbose)
File "/Users/XRIS/bin/hisat2_extract_snps_haplotypes_VCF.py", line 730, in main
genotypes)
File "/Users/XRIS/bin/hisat2_extract_snps_haplotypes_VCF.py", line 688, in add_vars
tmp_vars = extract_vars(chr_dic, chr, pos, ref_allele, alt_alleles, varID)
File "/Users/XRIS/bin/hisat2_extract_snps_haplotypes_VCF.py", line 103, in extract_vars
assert min_len >= 1
AssertionError
Note: The output file has only chr1 partial information
Also, I've tested SNPs files chromosome per chromosome and only got the error is some of them.
Any help or advice will be welcome,
Thanks in advance,
Christian