Is it possible to produce this kind of PCA:
https://rstudio-pubs-static.s3.amazonaws.com/89838_c06c544a19f94599aa856576e7c08e2b.html
without EIGENSOFT ? (for some reasons I can't install it in my computer).
Is it possible to produce this kind of PCA:
https://rstudio-pubs-static.s3.amazonaws.com/89838_c06c544a19f94599aa856576e7c08e2b.html
without EIGENSOFT ? (for some reasons I can't install it in my computer).
GAPIT can do this for you, too, but it needs other input data: http://www.maizegenetics.net/#!gapit/cmkv For the conversion of VCF to HapMap format, have a look here: Convert Plink Ped Format Into Hapmap Format?
You can also use FlashPCA, esp. because that one shows how to do LD-pruning of SNPs. You can then use the output pcs.txt in the R-script from your link,
oh i didn't realize that. i thought the whole point of PCA was to transform correlated, non-independent variables into a finite number of dimensions using a covariance matrix. I didn't realize it mattered if two SNPs were correlated because they were close to each other on the chromosome vs correlated because they both conferred some advantage in a certain environment.
Illumina has a C++ package that does partial PCA on a population VCF directly: https://github.com/Illumina/akt
(In the interest of full disclosure, I work at Illumina, but do not work on this tool)
I recently developed a brand new pca analysis software MingPCACluster that can go from vcf to pca and graph( (VCF2PCA and figture)). Very fast and low memory, accurate and very precise
https://github.com/hewm2008/MingPCACluster
### run without pop.info
# ./bin/MingPCACluster -InVCF Khuman.vcf.gz -OutPut OUT
### run with pop.info
./bin/MingPCACluster -InVCF Khuman.vcf.gz -OutPut OUT -InSampleGroup pop.info
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
How about using PLINK to generate the matrix of the VCF files and then do PCA for it.
PLINK has a lot of tools. Which one are you referring to? Is it
pseq proj v-matrix ...
?I am not sure actually. But I saw once that with PLINK a SNPs matrix(numerical) were generated. Through this, a PCA would be easy.
Does it perform LD pruning ?
Not sure. You might need to check them out by yourself because I haven't tried it. But I would recommend you to go with @Philipp and @Michs' answers.