PyVCF error: KeyError: 'ANN' when appending annotation fields to a list
0
0
Entering edit mode
7.4 years ago
spiral01 ▴ 110

I have downloaded the 1000 genomes phase 3 vcf file for chromosome 1 and annotated it using snpEff. I am now trying to parse the annotated file to create a new text file with only the data I need. My issue is when I try to parse the annotation field. My code for this bit is as below:

   tempList = []
   vcf_reader = vcf.Reader(open('/ann.chr1.vcf', 'r'))
   for record in vcf_reader:
        annList = [i.split('|') for i in record.INFO['ANN']]

This runs but I get the error:

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
KeyError: 'ANN'

I have tried running this same code on a much smaller file (I literally just took a few lines from the vcf file as well as the metadata to create a small file to test on) and it runs fine, but when I run it on the full vcf file I am getting this error. I have even tried appending the whole 'ANN' field to a list using:

for record in vcf_reader:
    annList.appendrecord.INFO['ANN'])

This works fine for all other fields (e.g. record.INFO['CHROM']) but I get the same error when it comes to the 'ANN' field. The code does run for a bit and I have checked the length of the list, but it is different everytime I run this code, indicating it is stopping at different points each time. As such, I really am not sure what is going on here. Thanks.

python PyVCF snp • 1.8k views
ADD COMMENT
0
Entering edit mode

Try debugging with a try-except statement to figure out on which lines it goes wrong, then look at those lines:

for record in vcf_reader:
    try:    
        annList.appendrecord.INFO['ANN'])
    except KeyError:
        print(record)
ADD REPLY

Login before adding your answer.

Traffic: 2915 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6