Principal Component Analysis
2
0
Entering edit mode
7.8 years ago

Hi,

Could anyone help me out with following question,

I want to perform Principal Component Analysis (PCA) on Genotype input data for SNPtest.

I know how to perform PCA on the type of genotype data where SNPs are just the genotypes (coded as 0, 1 or 2).

However, in the file format for SNPTEST, each SNP is represented as a set of three probabilities which correspond to the allele pairs AA,AB,BB. How can I perform PCA on this data?

I was thinking to apply some threshold, for example 0.9 and select genotypes that has probability >= 0.9. I would drop the SNPs that does not have any genotype with at least 0.9 probability. I am not sure if this approach is valid!

I would appreciate any suggestions on this! Thank you!

best regards, Krishna

PCA GWAS SNPTEST • 3.3k views
ADD COMMENT
1
Entering edit mode

I've never tried this and I won't pretend to be a GWAS expert, but I would try to just run the PCA with the data as it is. You might need to "tidy" the data into the following format:

Position/genotype    Sample1    Sample2   ...
pos1_AA              0.9        0.85
pos1_AB              0.05       0.1
pos1_BB              0.05       0.05
...

I would presume that that would produce reasonable PCA results.

ADD REPLY
0
Entering edit mode

Thank you Devon!

I am going to try that!

ADD REPLY
2
Entering edit mode
7.8 years ago
Vivek ★ 2.7k

I think you are describing the Oxford gen/sample format here. You can use something like qctool or gtool to convert them to PLINK binary format and use the standard PCA tools.

ADD COMMENT
0
Entering edit mode

Thank you! This is exactly what I was looking for!

ADD REPLY
0
Entering edit mode
7.8 years ago

ANGSD, a software for analyzing NGS data, has an implementation for PCA based on genotyping probabilities. You could give that a try: http://www.popgen.dk/angsd/index.php/PCA. It also takes into account depth of your sequence is that is the case.

Also, about what you propose. How do this genotyping probabilities look like? If say one is 0.91, the other 0.90 and the other 0.4, choosing the highest based on your cut-off would not be that "reliable" (for lack of a better word). Maybe some ratio-test would be better?

ADD COMMENT
0
Entering edit mode

Thank you! I will give it a try!

ADD REPLY

Login before adding your answer.

Traffic: 1535 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6