PCA on VCF
3
0
Entering edit mode
8.7 years ago
Picasa ▴ 650

Is it possible to produce this kind of PCA:

https://rstudio-pubs-static.s3.amazonaws.com/89838_c06c544a19f94599aa856576e7c08e2b.html

without EIGENSOFT ? (for some reasons I can't install it in my computer).

pca vcf • 6.8k views
ADD COMMENT
0
Entering edit mode

How about using PLINK to generate the matrix of the VCF files and then do PCA for it.

ADD REPLY
0
Entering edit mode

PLINK has a lot of tools. Which one are you referring to? Is it pseq proj v-matrix ...?

ADD REPLY
0
Entering edit mode

I am not sure actually. But I saw once that with PLINK a SNPs matrix(numerical) were generated. Through this, a PCA would be easy.

ADD REPLY
0
Entering edit mode

Does it perform LD pruning ?

ADD REPLY
0
Entering edit mode

Not sure. You might need to check them out by yourself because I haven't tried it. But I would recommend you to go with @Philipp and @Michs' answers.

ADD REPLY
1
Entering edit mode
8.7 years ago

GAPIT can do this for you, too, but it needs other input data: http://www.maizegenetics.net/#!gapit/cmkv For the conversion of VCF to HapMap format, have a look here: Convert Plink Ped Format Into Hapmap Format?

You can also use FlashPCA, esp. because that one shows how to do LD-pruning of SNPs. You can then use the output pcs.txt in the R-script from your link,

ADD COMMENT
0
Entering edit mode

Thanks for your link. Just one thing. Why do we have to perform LD pruning ?

ADD REPLY
1
Entering edit mode

SNPs in LD are not independent observations and result in spurious inflation of the distance in PCA.

ADD REPLY
0
Entering edit mode

oh i didn't realize that. i thought the whole point of PCA was to transform correlated, non-independent variables into a finite number of dimensions using a covariance matrix. I didn't realize it mattered if two SNPs were correlated because they were close to each other on the chromosome vs correlated because they both conferred some advantage in a certain environment.

ADD REPLY
0
Entering edit mode
8.7 years ago
Mitch Bekritsky ★ 1.3k

Illumina has a C++ package that does partial PCA on a population VCF directly: https://github.com/Illumina/akt

(In the interest of full disclosure, I work at Illumina, but do not work on this tool)

ADD COMMENT
0
Entering edit mode
2.4 years ago
hewm2008 ▴ 50

I recently developed a brand new pca analysis software MingPCACluster that can go from vcf to pca and graph( (VCF2PCA and figture)). Very fast and low memory, accurate and very precise

https://github.com/hewm2008/MingPCACluster

### run without pop.info
     #   ./bin/MingPCACluster   -InVCF  Khuman.vcf.gz   -OutPut OUT
### run with  pop.info
    ./bin/MingPCACluster    -InVCF  Khuman.vcf.gz   -OutPut OUT -InSampleGroup  pop.info 
ADD COMMENT

Login before adding your answer.

Traffic: 2241 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6