Hi,
Could anyone help me out with following question,
I want to perform Principal Component Analysis (PCA) on Genotype input data for SNPtest.
I know how to perform PCA on the type of genotype data where SNPs are just the genotypes (coded as 0, 1 or 2).
However, in the file format for SNPTEST, each SNP is represented as a set of three probabilities which correspond to the allele pairs AA,AB,BB. How can I perform PCA on this data?
I was thinking to apply some threshold, for example 0.9 and select genotypes that has probability >= 0.9. I would drop the SNPs that does not have any genotype with at least 0.9 probability. I am not sure if this approach is valid!
I would appreciate any suggestions on this! Thank you!
best regards, Krishna
I've never tried this and I won't pretend to be a GWAS expert, but I would try to just run the PCA with the data as it is. You might need to "tidy" the data into the following format:
I would presume that that would produce reasonable PCA results.
Thank you Devon!
I am going to try that!