Entering edit mode
3.0 years ago
reza
▴
300
I have a multi-sample VCF file (20 individuals) and I want to calculate Pi (nucleotide diversity) in each population for detection of the signature of selection. I do this with following commands:
vcftools --gzvcf Whole.vcf --keep pop1_list --window-pi 40000 --window-pi-step 20000 --out pop1.pi
vcftools --gzvcf Whole.vcf --keep pop2_list --window-pi 40000 --window-pi-step 20000 --out pop2.pi
these commands outputted two files with different windows numbers (86415 windows vs 86430) and different SNP numbers in the same windows, for example:
pop1
CHROM BIN_START BIN_END N_VARIANTS PI
NC_044511.1 1 40000 49 0.000265416
NC_044511.1 20001 60000 24 0.000146456
NC_044511.1 40001 80000 38 0.000386449
NC_044511.1 60001 100000 68 0.000650799
NC_044511.1 80001 120000 96 0.000888518
pop2
CHROM BIN_START BIN_END N_VARIANTS PI
NC_044511.1 1 40000 39 0.00030515
NC_044511.1 20001 60000 7 2.97E-05
NC_044511.1 40001 80000 39 0.000375541
NC_044511.1 60001 100000 78 0.000694135
NC_044511.1 80001 120000 102 0.000900462
while I run the following command I get 60 SNPs
bcftools stats -r NC_044511.1:1-40000
Why there is no correspondence between the results?