I have two SNP datasets, case and control. The first one was genotyped using Illumina HumanOmni1-Quad array. So all I have is A/B calls and 1/2 coded alleles in the PED file. For the control dataset I have BIM, FAM, ans BED files, but I have no idea what array was used to generate that data. In the BIM file SNPs represented in the letter format. All I know is that it relates to 'b37 forward'. I guess that means alleles in the control dataset are coded as forward strand.
My question is how to merge both datasets together? I obviously couldn't merge two datasets with the different encoding systems (1/2 and ACGT).
Moreover, I know that Illumina A/B calls solve the issue with the A/T or C/G SNPs, but apparently it is not solved in my control dataset.So before merging should I remove all such SNPs from both datasets to avoid ambiguity?
The fact that cases and controls are on two separate platforms is particularly worrying. Be careful.
Larry, David, thank you for warning! But do you have any ideas how to adjust one dataset in accordance with another?
I am not certain because we have no experience with such. My first inclination is to look at haplotypes to see that these are consistent in the two datasets. If a given haplotype in controls is a mess or scrambled in cases, then merging is more problematic.