Question

Filter according to a list of snps, setting SNPs not found in data as missing

0

Entering edit mode

10.4 years ago

vlaufer ▴ 290

I am calculating local ancestry using Lamp-LD.

I have phased ancestral haplotypes of only those SNPs in 1kG that are also in Omni 1M produced.

Some of these SNPs are not found in my GWAS data, which is currently in PLINK format, but the goal is to have an identical list of SNPs for both the ancestral and the admixed case and control populations. To do this, instead of just finding the intersect, I would like to ADD SNPs in the data that are missing from the admixed samples. This would enable me to make use of all the ancestral data, instead of trimming it down.

So, to do this, I scanned the PLINK documentation for v1.9 for an option that would enable me to submit a list of SNPs and then:

1. remove everything not in the ancestral data from the GWAS data

2. ADD everything that is in the ancestral data to the GWAS data, and set it as missing.

Is there a pre-existing program that does this, or do I need to write it myself?

Thank you

plink missing SNP GWAS • 2.3k views

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by vlaufer ▴ 290

Ram · Accepted Answer · 2014-07-27

4

Entering edit mode

10.4 years ago

chrchang523 11k

plink --bfile [your GWAS data] --extract [ancestral SNP list] --make-bed --out filtered_gwas plink --bfile filtered_gwas --bmerge ancestral --out combined

You can then use --keep/--remove to filter out the actual ancestral calls, if you want.

ADD COMMENT • link 10.4 years ago by chrchang523 11k

0

Entering edit mode

Ah - one caveat - suppose the ancestral information cannot be easily transformed into bed format?

ADD REPLY • link 10.4 years ago by vlaufer ▴ 290

0

Entering edit mode

What format is it in? Worst case, you can use a short shell script to generate, say, a 1-sample .tped file with nothing but missing genotypes.

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by chrchang523 11k