I would like to use pypgen for my analyses. I have installed pypgen on my linux system and checked the installation with example VCF file which has resulted into expected output. Now I am trying to use the VCF files from 1000 genomes project but it gives following error message. Am I doing something wrong? Do I need to preprocess the VCF files from 1000 genomes project? Thanks.
Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 504, in run self.__target(*self.__args, **self.__kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 314, in _handle_tasks for i, task in enumerate(taskseq): File "/usr/lib/python2.7/multiprocessing/pool.py", line 236, in <genexpr> self._taskqueue.put((((result._job, i, func, (x,), {}) File "/home/Programs/pypgen-master/doc/src/package/scripts/vcfSNVfstats", line 52, in vcf_iterator vcf_line = parse_vcf_line(line, empty_vcf_line) File "/home/Programs/pypgen-master/doc/src/package/pypgen/parser/VCF.py", line 302, in parse_vcf_line genotype['GQ'] = float(genotype['GQ']) KeyError: 'GQ'
Does your VCF file contain a GQ format field? It appears that pypgen needs to access that field.
You are a step ahead of me, as I'm struggling to get the example vcf file to run. Could you share exactly how you ran the vcf. i.e. from what folder, and the commands. I typed the command vcfSNVfstats -i /home/pypgen-0.2.0/src/packages/scripts/example.vcf.gz -p pop1:c511,c512,c513,c514,c515,c563,c614,c630,c639,c640 pop2:pop2:m523,m524,m525,m589,m675,m676,m682,m683,m687,m689 -c 2 -r Chr01:1-10001 | head
I get the error vcfSNVfstats: command not found.
I'm sure I'm missing something really obvious here, any ideas?
@Clare: Please ask another question rather than including your question in a comment. Feel free to include a link to this question in your new question if you think they are related.