I have followed advice given following another post and have extracted pca's from the vcf file containing the snps from 64 strains of our haploid organism. The vcf was made using the GATK best practices pipeline through a set of gvcfs and all ran smoothly. I converted this into plink format files using vcf tools and extracted unlinked snaps using the plink --indep-pairwise command and then extracted the pca's using --pca in plink_9. All in all straightforward, and a recommended route, but the thing is that when I extract different snp sets according to different parameters in --indep-pairwise I get very different pcas! Please can somebody provide a guide as to parameter choice for choosing the unlinked snps as otherwise it would seem that this standard procedure has a large arbitrary component. My choices have been 5000 10 0.25 and 1500 10 0.3. I note that some folk use r squared criteria as high as 0.5 which would include weakly linked snps. Why I wonder?