Hi all,
I was hoping that I could get some help with why this python script (installed from https://ppp.readthedocs.io/en/latest/PPP_pages/Utilities/vcf_bed_to_seq.html#) is not working for me.
The program is intended to convert SNP data into sequence data, using a VCF or BED file with a reference FASTA file. As there is no reference Neanderthal FASTA file, the human one is used.
The command to get this program to work is:
vcf_bed_to_seq.py --vcf neanderthal_file.vcf --model-file out.model --modelname 1Pop --fasta-reference GCF_000001405.25_GRCh37.p13_genomic.fna.gz --region 3:49828647-49848193
This raises the error:
KeyError: "sequence '3' not present"
To deal with this, I've converted both headings for the VCF and FASTA files to the same chromosome header (for CHR3): chromosome3. I did this with the command for both (changing based on what the original CHR header was):
sed -i 's/chromosome 3/chromosome3/g' new_neanderthal_file.vcf
However, it still raises the same error, instead with:
KeyError: "sequence 'chromosome3' not present"
All other files are correct (like the model file, and the model-name), so I know those aren't the issues. The issue is always raised in the same lines (492, 487, 303) in that order. This happens whether I use a .vcf or .bed file
The GR37 alignment is being used because this is what the original Neanderthal sequence was aligned to.
Could anyone help?