Is 1000 genomes data good enough to use for PCA?

1

Entering edit mode

8.6 years ago

das2000sidd ▴ 30

Hi I am trying to use the thousand genomes data snp data along with common snps from my exome sequencing project to perform principal component analysis (PCA). I have generated a combined PLINK binary file of my data and the snp data from the thousand genomes data. Then I am using the R package SNPRelate to perform the PCA analysis. Unfortunately regardless of whatever LD value I use to generate a pruned snpset, my samples do not cluster with any of the population groups of the thousand genomes data. In fact they always cluster around the (0,0) mark in the PCA plot. Does anyone know as to why this might be happening and have some suggestions as to how this should be done? Sincere thanks for any suggestions in advance.

sequencing next-gen • 3.1k views

ADD COMMENT • link updated 8.6 years ago by GenoMax 148k • written 8.6 years ago by das2000sidd ▴ 30

0

Entering edit mode

Yes. What allele frequency off are you using? See the PCAs in the 1kg publications.

ADD REPLY • link 8.6 years ago by Zev.Kronenberg 12k

Login before adding your answer.