Hello,
I recently got back some genotyping data back from the core lab from the Axiom Human Orgins array, and I have been trying to analyze it (trying being the operative word). I have been having some major frustrations.
I was trying to run a PCA in R when I noticed that a few thousand of the rows were pairs sharing the same rs ID. (In case it is relevant, the table was constructed with rs IDs making up the rows with each column being a sample genotype coded as 0/1/2). I noticed this was the case within Genotyping console also (as well as when I export data out of it). There are many cases where there are two Probe Set IDs for the same Affy SNP ID and dbSNP RS ID values...
Anyone know why is this? Is this indicative of the exact same site being genotyped twice? How do I filter these cases out, so that when I export genotypes I am not getting duplicates of the same loci? Thanks!
First stop would be to check your annotation. Have you looked a few of the SNPs up on Netaffx (the Affymetrix support site, go to affymetrix.com) to see what their current official annotations are? I suggest you pull down the latest annotation file if you don't already have it to make sure that you don't have a mangled copy.
I downloaded the annotation through Genotyping Console. When I started using the dataset, it asked me my Affy username (=academic email) and Affy password, and downloaded the latest files. I downloaded the annotation file manually via browser to confirm the issue, and the duplicated Affy SNP ID and dbSNP RS ID values exist there too.
David Reich's group helped design the array, so I looked up their published technical note and found this
But when I go through the annotation file, these duplicated probe sets are all bilallelic. Any clue what is going on?