I am calculating local ancestry using Lamp-LD.
I have phased ancestral haplotypes of only those SNPs in 1kG that are also in Omni 1M produced.
Some of these SNPs are not found in my GWAS data, which is currently in PLINK format, but the goal is to have an identical list of SNPs for both the ancestral and the admixed case and control populations. To do this, instead of just finding the intersect, I would like to ADD SNPs in the data that are missing from the admixed samples. This would enable me to make use of all the ancestral data, instead of trimming it down.
So, to do this, I scanned the PLINK documentation for v1.9 for an option that would enable me to submit a list of SNPs and then:
1. remove everything not in the ancestral data from the GWAS data
2. ADD everything that is in the ancestral data to the GWAS data, and set it as missing.
Is there a pre-existing program that does this, or do I need to write it myself?
Thank you
Ah - one caveat - suppose the ancestral information cannot be easily transformed into bed format?
What format is it in? Worst case, you can use a short shell script to generate, say, a 1-sample .tped file with nothing but missing genotypes.