Hi, I'm trying to convert my .vcf file to something that TreeMix can use. I've tried various different ways that have been suggested before:
./vcf2treemix.sh ~/directory/data.vcf ~/plink.cluster3
Which results in the errors:
ERROR: Problem parsing the command line arguments.
TypeError: 'dict' object is not callable
I read that the above error was because in Python3 they changed the way dict()
was callable, and so I went to one of the source scripts used in the above bash script, plink2treemix.py
.
I have all the PLINK format scripts for data.vcf
, and put in the code:
plink1 --bfile ~/directory/data.vcf --freq --missing --within data.cluster
python2 plink2treemix.py ~/directory/data.vcf.frq.strat.gz chr22treemix
I tried to get around the issue of Python3 not having the exact has_keys
call for dict()
by using Python2. But this used too much memory, and the PC killed the process.
The same issue happens when I try to use this script for STACKS-2.57:
/home/usr/stacks-2.57/bin/populations --in-vcf ~/directory/data.vcf --treemix -O ./ -M pop_map.tsv
Overall, no matter what I try, there is a memory error and the process gets killed.
My data.vcf
is a very large file (9GB or so), but even when I gzip the file, it's still too much for the computer's memory.
Is there any way to make this less memory intensive? Is there any other way to convert .vcf files to the TreeMix format? Can someone help me with the TypeError in the first issue?