Filter according to a list of snps, setting SNPs not found in data as missing
1
0
Entering edit mode
10.4 years ago
vlaufer ▴ 290

I am calculating local ancestry using Lamp-LD.

I have phased ancestral haplotypes of only those SNPs in 1kG that are also in Omni 1M produced.

Some of these SNPs are not found in my GWAS data, which is currently in PLINK format, but the goal is to have an identical list of SNPs for both the ancestral and the admixed case and control populations. To do this, instead of just finding the intersect, I would like to ADD SNPs in the data that are missing from the admixed samples. This would enable me to make use of all the ancestral data, instead of trimming it down.

So, to do this, I scanned the PLINK documentation for v1.9 for an option that would enable me to submit a list of SNPs and then:

1. remove everything not in the ancestral data from the GWAS data

2. ADD everything that is in the ancestral data to the GWAS data, and set it as missing.

Is there a pre-existing program that does this, or do I need to write it myself?

Thank you

plink missing SNP GWAS • 2.3k views
ADD COMMENT
4
Entering edit mode
10.4 years ago

plink --bfile [your GWAS data] --extract [ancestral SNP list] --make-bed --out filtered_gwas
plink --bfile filtered_gwas --bmerge ancestral --out combined

You can then use --keep/--remove to filter out the actual ancestral calls, if you want.

ADD COMMENT
0
Entering edit mode

Ah - one caveat - suppose the ancestral information cannot be easily transformed into bed format?

ADD REPLY
0
Entering edit mode

What format is it in? Worst case, you can use a short shell script to generate, say, a 1-sample .tped file with nothing but missing genotypes.

ADD REPLY

Login before adding your answer.

Traffic: 1652 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6