New to SNP data: I need help getting Affy Axiom data out of genotyping console into a Principle Component Analysis
1
0
Entering edit mode
9.6 years ago
devenvyas ▴ 760

Hello, the overwhelming (i.e., >95%) majority of my bioinformatics experience is with the mitochondrial genome within my population of interest. As of last week, I have a SNP data from an Affymetrix Axiom Human Origins array (~629,000 SNPs) on a collection of my samples, and I am feeling absolutely overwhelmed. The first analysis I need to run on the data is to run a Principle Component Analysis to compare my samples to previously published populations (i.e., HapMap, HGDP, whatever else there is out there; all of which I still need to download).

I've got my data out of Genotyping Console; however, the format is a mess, for whatever reason a few thousand of the SNPs only have Affy IDs and not dbSNP rs ids, and I am not sure what I am doing (and unfortunately most of literature only describes what was done, never how it was done). I was wondering, can anyone point me in the direction of some form of tutorial on how to get my data into a principle component analysis? Thanks!

Affymetrix SNP • 4.8k views
ADD COMMENT
0
Entering edit mode

Hi devenvyas,

I am working with a kind of similar project, and I will really thank you if you could give me some guidelines about what to do with the data. Thank you

ADD REPLY
0
Entering edit mode
9.1 years ago
Simo ▴ 50

Hi, I'm also working with the Axiom Human Origins Array and I got my data out the new Axiom Analysis Suite. I also noticed that the rs codes are not shown, but a list of Axiom probe codes is printed out instead. I personally downloaded the Annotation file from Affymetrix website and ran a Perl script in order to substitute all the Axiom codes with the associated rs code. I also have noticed something strange in the output, can you tell me if you have noticed the same thing?

Some markers seem to have been tested twice (or three times in some cases). The associated probes have different codes, but the rs codes are the same. Also, the call are different for some individuals and it's not easy to determine which one of them should be taken into account. This is something that I have noticed even before substituting the axiom codes with the rs ones, infact even the positions (already present in the output) were doubles, or triplets sometimes.

Have you found something similar? And have you found some information about this Microarray data analysis? I'm struggling!

Thanks

ADD COMMENT
1
Entering edit mode

Hello, so if you look at the technical documentation published about the array, they note ~4000 sites were genotyped twice, because the allele you test for for those loci affects your results (i.e., for an A/G SNP, the probe either looks for an A or a G). They say those alleles could possibly be triallelic.

I emailed Affy to see if there was a way to get the console to merge the genotypes, but they had no way of doing so. Given the risk of triallely, my solution was to exclude those SNPs. I exported my SNP list from Genotyping Console, removed the repeating Affx numbers, and proceeded from there.

ADD REPLY
0
Entering edit mode

Thank you very much! Finally, I've removed them as well, but before I've substituted the Affx codes with the corresponding rs codes. Actually, the latter process showed that some probes don't have any rs code associated, some of them are " --- ", and other simply " ". Since PLINK is having problem in dealing with the " " ones, I'm thinking about removing them too or simply to substitute the " " with the " --- ". Have you found something similar? Thanks

ADD REPLY
0
Entering edit mode

What I did was, if the snp had an Affx but not a rs, I kept the Affx. If the snp had an rs, I dropped the Affx. As a result, my map files, are predominantly containing rs ids with a minority of Affx ids.

ADD REPLY

Login before adding your answer.

Traffic: 1583 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6