I have run a SNP pathway analysis on one dataset (dataset A) and now I want to test if the significant pathway I have replicates in a second dataset (dataset B) which has a different genotyping platform.
The main problem is that many SNPs in my significant pathway from dataset A are not present in the dataset B (I don’t want to do imputation for various reasons).
In order to try and address this problem I have searched for SNPs in dataset A that are in high LD with these missing SNPs. I then searched dataset B for the presence of these SNPs with the idea of adding them to dataset B as proxies.
However, I am unsure how to go about ‘adding’ these proxy SNPs to my final dataset (which has been pruned and filtered).
My current strategy is to make a dataset from these proxy SNPs and merge this ‘proxy SNP dataset’ to the filtered dataset (i.e. - -merge command in Plink). I know this is not ideal, so wondered if anyone has thoughts on alternative strategies here ?
That sounds reasonable to me. I don't see why --merge would cause a problem here. One thing to keep track of though is if you have a pruned data set that you remove the SNPs selected in the pruning procedure that are in LD with the SNPs your are merging in.
Great, thanks for the advice - I will keep with this stragety then. Re: the LD, yes I was thinking that too, I will make sure I do that!
Your objective is to verify if the significant pathway from dataset A replicates on dataset B. Why do you have to merge the datasets? I would do the two analysis independently and then compare at the pathway level.