I am merging data sets, and the SNP IDs between the two are inconsistent, likely because Affx numbers changed between annotations. For example, merging SNP lists (in R) based on SNP ID loses me ~50,000 SNPs, but merging based on coordinate will only lose me ~10,000 SNPs (i.e., 545,956 sites vs. 585,413 sites).
If I have a command as such
plink --file data1 --merge data2.ped data2.map --recode --out merge
What do I do to tell Plink to ignore the SNP IDs from data2 and merge based on coordinate, not SNP ID?
Thanks!
-Deven
You could change the SNP if by his position in the map or bim file: chr1:123456789
I am not following quite what you mean, and I think that method may actually take longer than just getting Plink to ignore the .
Currently I have:
a) the two data sets in map/ped format
b) a list of 585,413 coordinates that match
c) a map file for the failed merger containing the 545,956 sites where both the SNP id and the coordinates both match
To begin with, I am not sure how to properly isolate the coordinates for the 39,457 that do not have matching SNP ids. After doing that I would need to find the old SNP id and the new SNP id for each.
Isn't there some simple way to tell Plink to ignore the SNP ids during merger?