Question

Raising a KeyError when I know the key is in both files

0

Entering edit mode

2.1 years ago

hemr3 ▴ 10

Hi all,

I was hoping that I could get some help with why this python script (installed from https://ppp.readthedocs.io/en/latest/PPP_pages/Utilities/vcf_bed_to_seq.html#) is not working for me.

The program is intended to convert SNP data into sequence data, using a VCF or BED file with a reference FASTA file. As there is no reference Neanderthal FASTA file, the human one is used.

The command to get this program to work is:

vcf_bed_to_seq.py --vcf neanderthal_file.vcf --model-file out.model --modelname 1Pop --fasta-reference GCF_000001405.25_GRCh37.p13_genomic.fna.gz --region 3:49828647-49848193

This raises the error:

KeyError: "sequence '3' not present"

To deal with this, I've converted both headings for the VCF and FASTA files to the same chromosome header (for CHR3): chromosome3. I did this with the command for both (changing based on what the original CHR header was):

sed -i 's/chromosome 3/chromosome3/g' new_neanderthal_file.vcf

However, it still raises the same error, instead with:

KeyError: "sequence 'chromosome3' not present"

All other files are correct (like the model file, and the model-name), so I know those aren't the issues. The issue is always raised in the same lines (492, 487, 303) in that order. This happens whether I use a .vcf or .bed file

The GR37 alignment is being used because this is what the original Neanderthal sequence was aligned to.

Could anyone help?

fasta python VCF • 375 views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 2.1 years ago by hemr3 ▴ 10