Hi everybody !
It's a simple question, but I can't find any information on that either on the Internet or in the PDF manual for PGDSpider. I'm trying to convert a VCF file into a Bayscan input file with PGD spider. The .VCF file obtained with SAMTOOLS contains the SNP information on 24 individuals : 12 in a population and 12 in another population. When I edit the SPID file in PGDSpider just before launching the conversion task, I'm asked if I want to include a file with "population definitions". Then, I have to select an input file that contains these population definitions. Since I want a Baysecan input file, I need this population information : the Bayescan file is supposed to contain information on the SNP count PER POPULATION. The problem is that I don't know what this population definitions file is supposed to look like and there is no example in the "example" folder given with the program. PGDSpider has to know which samples in the VCF file are in which population.
I have tried several input formats, but they all got rejected. Can somebody help me ? I thought of writing a python script to parse the VCF file and create the bayescan input file myself, but it would be a lot faster and easier to use PGDSpider.
Thanks for any help ! I appreciate,
Cheers !
No I haven't found any direct solution to that problem with PGD Spider. Instead of trying to fix it, since I didn't have any answer, I created my own custom Python scripts to parse the VCF file, produce a genotype matrix file and then parse this genotype matrix file to create the input file for Bayescan.
The only problem is that my scripts are extremely custom, i.e adapted to my file names and population names. If you are completely stuck with this, you can quickly tell me what kind of data you have, how many populations and maybe give me the header of your VCF file (including the columns that contain the names of your SAM files, i.e all the lines that start with "##" and the line that starts with "#CHROM").