Pgdspider - What Is The Format Of The "Population Definition" File - Vcf To Bayescan ?
3
0
Entering edit mode
12.7 years ago

Hi everybody !

It's a simple question, but I can't find any information on that either on the Internet or in the PDF manual for PGDSpider. I'm trying to convert a VCF file into a Bayscan input file with PGD spider. The .VCF file obtained with SAMTOOLS contains the SNP information on 24 individuals : 12 in a population and 12 in another population. When I edit the SPID file in PGDSpider just before launching the conversion task, I'm asked if I want to include a file with "population definitions". Then, I have to select an input file that contains these population definitions. Since I want a Baysecan input file, I need this population information : the Bayescan file is supposed to contain information on the SNP count PER POPULATION. The problem is that I don't know what this population definitions file is supposed to look like and there is no example in the "example" folder given with the program. PGDSpider has to know which samples in the VCF file are in which population.

I have tried several input formats, but they all got rejected. Can somebody help me ? I thought of writing a python script to parse the VCF file and create the bayescan input file myself, but it would be a lot faster and easier to use PGDSpider.

Thanks for any help ! I appreciate,

Cheers !

conversion vcf • 12k views
ADD COMMENT
2
Entering edit mode
11.5 years ago

Hi

The "population definition" file contais the definition of which individual belongs to which population. It is a simple file with all individual names in the first column and the corresponding population names in the second column (columns are whitespace separated):

Ind_1  pop1
Ind_2  pop1
Ind_3  pop2
Ind_4  pop4
Ind_5  pop2 
...

A short description of the file can be found in the PGDSpider manual under the vcf format (Special PGDSpider input/output questions).

Cheers Heidi

ADD COMMENT
0
Entering edit mode
12.5 years ago

Hello Francois,

I am having the same problem, I have a vcf file that I want to covert to a Bayescan input file, did you found out what are the population definitions for the PGDSpider?

Many thanks

Mónica

ADD COMMENT
0
Entering edit mode

No I haven't found any direct solution to that problem with PGD Spider. Instead of trying to fix it, since I didn't have any answer, I created my own custom Python scripts to parse the VCF file, produce a genotype matrix file and then parse this genotype matrix file to create the input file for Bayescan.

The only problem is that my scripts are extremely custom, i.e adapted to my file names and population names. If you are completely stuck with this, you can quickly tell me what kind of data you have, how many populations and maybe give me the header of your VCF file (including the columns that contain the names of your SAM files, i.e all the lines that start with "##" and the line that starts with "#CHROM").

ADD REPLY
0
Entering edit mode
11.4 years ago

I've just encountered the same problem. Converting from VCF to other formats ignored the population definition file using PGDspider 2.0.4. Surprisingly, the only conversion that worked was converting first from VCF to the PGD format and then converting from PGD to any other format. This way, the population definition file was actually properly used. Could this be a bug?

Cheers, Daniel

ADD COMMENT

Login before adding your answer.

Traffic: 2376 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6