Probabilistic PCA on very sparse SNP matrix
0
0
Entering edit mode
6.5 years ago
dominicdhall ▴ 40

I have a very sparse SNP matrix (~90% missing genotypes by sample and ~90% missing samples per SNP) which I would like to perform some sort of probabilistic PCA on. I have been using the packages VariantAnnotation to get the my snpMatrix object and originally tried to mimic a method shown here (https://www.bioconductor.org/packages/release/bioc/vignettes/snpStats/inst/doc/pca-vignette.pdf ) with the package snpStats. However, I don't believe this package was intended to work with extremely sparse SNP matrices and it struggles to correct for missing values within the SNP matrix.

I have tried to use the ppca function from the package pcaMethods but have not had a huge amount of success in finding any clusters of cells. Does anyone have any experience working with very sparse matrices for pca?

pca probabilistic pca snpmatrix snpstats • 1.5k views
ADD COMMENT
1
Entering edit mode

what's your goal, i.e. what insights do you hope to get from the probabilistic PCA?

ADD REPLY
0
Entering edit mode

Can you first filter the sites that always have missing values first?

ADD REPLY

Login before adding your answer.

Traffic: 2239 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6