Hello,
I'm starting a new post about the liftover process for SNPs because several questions came into my mind during the work I've done so far. And I want to collect all issues and answers.
What is my goal: I want to change thousend genomewide SNP positions (SNP array) from hg18 (Ref_36) to hg19 (Ref_37). The data is in plink ped/map file format and grouped by chromosome (22 plus X & Y).
The first tool I used is of course liftover from USCS. This works very well. Transform the map file into a BED file, lift over and back again. But I found out, that rs numbers of some SNPs can also be changed. For example rs2266988 into rs1129172. Therefore I found it necessary to lift and update the rs numbers as well. I found this nice tutorial and used their python script. Briefly, the script compares rs numbers in two history files from dbSNP (RsMergeArch and SNPHistory) for updating rs numbers.
After doing this, I go further with the new rs numbers, using biomart in R to get the new chromosomal (hg19) positions for each SNP. And in addition I used the dbSNP file b138_SNPChrPosOnRef.bcp.gz to update, compare and validate the location. Here I get now some troubles when comparing the results. For a small proportion of SNPs the dbSNP file has following annotations: Mapped unambiguously on non-reference assembly only. e.g. rs11090516
What does this exactly mean? Should I remove those SNPs?
Finally, I want to updated the map files and change the corresponding ped files if some SNPs were excluded.
In general, is this a good or appropriate approach to liftover a SNP-array? Do you have any suggestions or improvements?
@Jimbou, Hey, Even I'm working something similar like above. I needed some guidance here as you have done this before. Is there any way I can contact you regarding this, and only if it's ok with you.