I have downloaded the 1000 genomes phase 3 vcf file for chromosome 1 and annotated it using snpEff. I am now trying to parse the annotated file to create a new text file with only the data I need. My issue is when I try to parse the annotation field. My code for this bit is as below:
tempList = []
vcf_reader = vcf.Reader(open('/ann.chr1.vcf', 'r'))
for record in vcf_reader:
annList = [i.split('|') for i in record.INFO['ANN']]
This runs but I get the error:
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
KeyError: 'ANN'
I have tried running this same code on a much smaller file (I literally just took a few lines from the vcf file as well as the metadata to create a small file to test on) and it runs fine, but when I run it on the full vcf file I am getting this error. I have even tried appending the whole 'ANN' field to a list using:
for record in vcf_reader:
annList.appendrecord.INFO['ANN'])
This works fine for all other fields (e.g. record.INFO['CHROM']) but I get the same error when it comes to the 'ANN' field. The code does run for a bit and I have checked the length of the list, but it is different everytime I run this code, indicating it is stopping at different points each time. As such, I really am not sure what is going on here. Thanks.
Try debugging with a try-except statement to figure out on which lines it goes wrong, then look at those lines: