Question

Treemix is taking too long running

0

Entering edit mode

2.8 years ago

Camila Martínez ▴ 40

Hello everyone! I'm trying to run TreeMix with conda. I first created a new environment and installed Treemix (conda install treemix -c bioconda -c conda-forge). The Treemix input file was created with the populations module from Stacks from a vcf with 894 SNPs from 6 populations. I tried running Treemix using the next command:

for M in {0..10};
do for k in 500 1000 1500; 
do echo k \= $k;
treemix -i input.recode.p.treemix.gz -root OUT -m ${M} -k ${k} -bootstrap -o trmxout_${M}_${k};
done;
done

It seems to work since it rapidly outputs the files for M=0, but when it starts with the analysis for M=1, it takes forever. It's been 18 hours and it's still running the for M=1 with k=500. I've tried with only k=1500 and I think it finished cause it says

DONE.

But the prompt doesn't appear, so I don't think it really finished. I have runned TreeMix before and it never took that long, I don't know what may be the problem. Can someone please help me? Thank you.

UPDATE: It seems that if I try running without a "k" parameter (i.e. without specifying the number of bases between the evaluated SNPs) it works perfectly. Except with a very high "m" parameter, like 9 or more migration edges. I think that if I pre filtered my VCF for LD I might not need the k parameter, so the problem wouldn't matter anymore. But I don't know what was hapenning. Any thoughts?

treemix cutadapt ubuntu • 1.4k views

ADD COMMENT • link updated 9 months ago by Michael 55k • written 2.8 years ago by Camila Martínez ▴ 40

score 0 · Answer 1 · 2024-08-20

Hi, Camila and everyone stumbling on this. I know it has been a long time but I just stumbled over a similar problem. If treemix takes forever for higher M's it may have to do with the input file. Check the log file for lines with log-likelihood (LL). If LL = nan, then the program might silently never terminate. Then check the input frequency file, it should have multiple lines (ideally equal to the number of SNPs in the input). If there are only 2 lines in that file or much fewer files than SNPs, there might have been an error during the generation of the input file.

For example, if you are following this TreeMix howto and use this commonly used vcf2treemix script, your chromosome names need to be in the format chr1,chr2,... If you have e.g. 1,2,3 you may get a treemix input file with only 1 SNP per chromosome and this will cause this effect.

If the input is properly formatted, but you still see LL=nan, then you might have too few SNPs in general.