Question

Memory issue with my PC when trying to convert PLINK to TreeMix

0

Entering edit mode

4.2 years ago

hemr3 ▴ 10

Hi, I'm trying to convert my .vcf file to something that TreeMix can use. I've tried various different ways that have been suggested before:

./vcf2treemix.sh ~/directory/data.vcf ~/plink.cluster3

Which results in the errors:

ERROR: Problem parsing the command line arguments.

TypeError: 'dict' object is not callable

I read that the above error was because in Python3 they changed the way dict() was callable, and so I went to one of the source scripts used in the above bash script, plink2treemix.py.

I have all the PLINK format scripts for data.vcf, and put in the code:

plink1 --bfile ~/directory/data.vcf --freq --missing --within data.cluster

python2 plink2treemix.py ~/directory/data.vcf.frq.strat.gz chr22treemix

I tried to get around the issue of Python3 not having the exact has_keys call for dict() by using Python2. But this used too much memory, and the PC killed the process.

The same issue happens when I try to use this script for STACKS-2.57:

/home/usr/stacks-2.57/bin/populations --in-vcf ~/directory/data.vcf --treemix -O ./ -M pop_map.tsv

Overall, no matter what I try, there is a memory error and the process gets killed.

My data.vcf is a very large file (9GB or so), but even when I gzip the file, it's still too much for the computer's memory.

Is there any way to make this less memory intensive? Is there any other way to convert .vcf files to the TreeMix format? Can someone help me with the TypeError in the first issue?

STACKS bash vcf plink treemix • 1.3k views

ADD COMMENT • link 4.2 years ago by hemr3 ▴ 10