Entering edit mode
7.4 years ago
Sheila
▴
460
I am doing QC for a GWAS analysis. I used pc-AIR and pc relate (two Bioconductor Packages) to determine the relatedness and population substructure of my given dataset. I compared it to 1000 genomes data and have a plot comparing the first two PCs in my PCA analysis. In general, what is the best practice for excluding subjects from a study after visually scrutinizing the PC plot. Is there a specific method (ie R package) to use that's considered best practice? or do I arbitrarily decide that base on the graph I want to include a certain set of subjects?
Thanks for your thoughts, in advance.
Here you will find a very detailed answer. https://stats.stackexchange.com/questions/8777/in-genome-wide-association-studies-what-are-principal-components Also, I suggest you give a look to GENABEL manual (http://www.genabel.org/sites/default/files/pdfs/GenABEL-tutorial.pdf). In paragraph 5.3 they describe the method used for outlier detection.