Hello,
I have two SNPs datasets that I need to merge. I have created a file called "shared.bim" which contains all the shared sites and a reference_allele.list to reorder the sites in both files before merging.
Since trying to merge without first removing the non shared sites would give an error ("Warning - impossible allele assignment") I need to remove the non shared sites from both datasets.
I know I need to create a command that contains --recode and --make bed and have a SNP-LIST-FILE to indicate what to remove; I just don't understand how to create the list of the SNP that I need to remove.
Is there a way to simply tell with a script: "remove those not contained in "shared.bim"?
Thank you
Hello,
I don't have problems with the merging process, just on how to flag the non shared ones
I did create a text file called "shared_sites.txt", which contains the shared sites taken from shared.bim.
I wrote this script to prepare for merging (I will merge in VCFtools):
plink --bfile dataset1 --recode vcf --a2-allele reference_allele.list --keep-clusters shared_sites.txt --out dataset1.filtered
I have put --keep-clusters shared_sites.txt to try flagging the shared sites to keep, while --a2-allele reference_allele.list is meant to reorder the snps according to the list.
Is the keep cluster command going to work for this purpose?