Question

Gwas Data. Merge Illumina A/B With Another Dataset

2

Entering edit mode

13.6 years ago

Rich ▴ 40

Hi,
I have two SNP datasets, case and control. The first one was genotyped using Illumina HumanOmni1-Quad array. So all I have is A/B calls and 1/2 coded alleles in the PED file. For the control dataset I have BIM, FAM, ans BED files, but I have no idea what array was used to generate that data. In the BIM file SNPs represented in the letter format. All I know is that it relates to 'b37 forward'. I guess that means alleles in the control dataset are coded as forward strand.
My question is how to merge both datasets together? I obviously couldn't merge two datasets with the different encoding systems (1/2 and ACGT).
Moreover, I know that Illumina A/B calls solve the issue with the A/T or C/G SNPs, but apparently it is not solved in my control dataset.So before merging should I remove all such SNPs from both datasets to avoid ambiguity?
Thanks!

gwas illumina format • 4.4k views

ADD COMMENT • link updated 13.6 years ago by Larry_Parnell 16k • written 13.6 years ago by Rich ▴ 40

score 2 · Answer 1 · 2012-02-01

2

Entering edit mode

13.6 years ago

Larry_Parnell 16k

I can tell you that whatever method you choose, you need to be very careful which data you merge and how you do it. Merging data from different platforms was essentially the key reason that this GWAS of exceptional longevity from 2010 was retracted. You could search and find news articles and blog entries from that time to learn the details.

ADD COMMENT • link 13.6 years ago by Larry_Parnell 16k

0

Entering edit mode

The fact that cases and controls are on two separate platforms is particularly worrying. Be careful.

ADD REPLY • link 13.6 years ago by David Quigley 11k

0

Entering edit mode

Larry, David, thank you for warning! But do you have any ideas how to adjust one dataset in accordance with another?

ADD REPLY • link 13.6 years ago by Rich ▴ 40

0

Entering edit mode

I am not certain because we have no experience with such. My first inclination is to look at haplotypes to see that these are consistent in the two datasets. If a given haplotype in controls is a mess or scrambled in cases, then merging is more problematic.

ADD REPLY • link 13.6 years ago by Larry_Parnell 16k