Hi,
I just posted a similar reply in researchgate. I will copy it here. Unfortunately I could not use glactools from Gabriel because I have no reference.fai (my vcf was generated with STACKS denovo).
So, I started with a .vcf file and converted it to the custom plink format. Treemix requires a specifically formatted file. The website says that in order to create this file you need to run:
$ plink --bfile data --freq --missing --within data.clust
$ gzip plink.frq
$ plink2treemix.py plink.frq.gz treemix.frq.gz
The first codeline ($1
) requires a .clst
file. .clst
files are generated with --write-cluster --family
and it consists in three columns including 1. Family ID; 2. ID; 3. cluster. I used several awk commands to format the information properly.
The first codeline ($1
) produces a .frq
file which according to plink2's webiste includes:
I. CHR [Chromosome code]
II. SNP [Variant identifier]
III. A1 [Allele 1 (usually minor)]
IV. A2 [Allele 2 (usually major)]
V. MAF [Allele 1 frequency]
VI. NCHROBS [Number of allele observations]
So, here start the problems. If you open the provided python file:
$ nano plink2treemix.py
You will notice that the script requires that the file has 8 columns (remember that python starts counting in 0. mc = line[6]
, total = line[7]
are columns 7 and 8 respectively). I think this might be a problem with the plink versions. I was using plink2 (v1.9) and the program was written when plink1 (v1.07) was around. So if you hit with the following error:
ERROR: treemix mc = line[6] indexerror: list index out of range
I expect it to be a mismatch between plinks versions.
So then I switched to plink1 and it complained that the SNPs had similar names. Because my data was generated with STACKS denovo, the SNPs do not have a position in the genome and <mydata.bim> had .
for SNP. I used awk to solve this:
$ awk '{$2 = $1"_"$4; print}' mydata.bim > mydata.SNPrenamed.bim ################ so, we change the second column ($2) to be a combination including column 1, an underscore and column 4 ( $1"_"$4 ).
Plink kept complaining so I decided to take a new approach.
I noticed that the POPULATIONS mode as part of STACKS 1 had a treemix flag.
I created a population map for stacks, which is a file composed of two columns, the first one with individuals and the second with populations:
IND_01 * TAB * POP_A
IND_02 * TAB * POP_A
IND_100 * TAB * POP_C
So I just went on STACKS v1.48 and did:
/path/to/stacks-1.48/bin/populations --in_vcf mydata.vcf --treemix -O ./ -M pop_map.tsv
--in_vcf : input vcf
--treemix : generate a treemix output
-O : output folder
-M : population map
And it worked!
hi, have you solved this problem ? I have same problem. Thank you
I have managed to use TreeMix. The way I did it is that I had illumina reads from different isolates of the same species, so I mapped each of my library to the same reference individually and called SNPs using the PoPoolation toolkit. I then had quick script to change the PoPoolation output to the TreeMix standard.
Hi all,
I have been the same problem, though the error is:
Did anybody solve it?
Felipe
hi, have you solved this problem? i also have the same prolem. Could you give me some suggestions, Thank you!