Flip Gwas Data To Positive Strand (Hg19 Build 37)
2
5
Entering edit mode
10.7 years ago
Kevin ▴ 50

I have GWAS data from Illumina HumanOmniExpress BeadChip in PLINK format. I am wondering the easiest way to find SNPs not mapped to the positive strand (using reference hg19/b37) and flip them. I know PLINK has the --flip command but it needs a list of SNPs to flip. How do I generate this list?

plink snps gwas • 13k views
ADD COMMENT
2
Entering edit mode

It can be a bit messy, get SNP names from plink MAP file, get strands and alleles from UCSC Tables, check if alleles match, then add strands, then flip. Or download SNP file from illumina to get strands?

ADD REPLY
0
Entering edit mode

No, it is very easy. Please see: https://github.com/endrebak/snp-flip

ADD REPLY
1
Entering edit mode
ADD REPLY
6
Entering edit mode
10.3 years ago

I wrote a command line tool to do this very thing. Please see https://github.com/endrebak/snp-flip

The tool works right out of the box as long as you have biopython installed and a reference genome to do lookups in. See the github repo README.md for examples and documentation. It comes with example files to play around with.

Comments appreciated.

ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

How does your tool treat AT and CG? As far as I know the Plink format doesn't store which allele is the reference allele. Instead, Plink assigns the minor allele to allele1 and the major allele to allele2. The major allele is not always the reference allele.

ADD REPLY
0
Entering edit mode

I use a reference genome to decide which is the reference allele; I do not consider any of the alleles in the plink file a reference allele (but perhaps that should be an option for old type plink files?). So if the plink file says A1 and A2 are A and T that SNP is considered ambiguous. I should add that as an example to the README.md and explain how the tool works a bit better.

ADD REPLY
0
Entering edit mode

So your tool labels all SNPs with AT or GC as ambigous (meaning to delete them)? Deciding whether AT or GC has to be flipped or not is the major issue in the flipping process. Since you delete them all, you don't give a solution for that.

ADD REPLY
1
Entering edit mode

Whether you should delete them is up to you. If you find that no (or all) nonambiguous SNPs are on the reference strand, you should probably keep them, but flip all (or not).

Given only a plink file and a reference genome it is impossible to solve this problem - you need manifest files or more info (but be warned; down this road lies insanity - so many quirks, issues and bad data). Here are some files that might help: http://www.well.ox.ac.uk/~wrayner/strand/ though.

ADD REPLY
0
Entering edit mode

I meet more terrible thing. Since the data was shared by others with plink format, ref/alt were changed, meanwhile, some allele is based on TOP, another dataset is based on BOTTOM. It's terrible to merge these two dataset.

ADD REPLY
2
Entering edit mode
10.6 years ago

You should look in the original output file (finalreport). This file sould have Top alleles (Illumina nomenclature) and the forward alleles. If you do the flip by comparing your minor allele with those from UCSC, all SNPs with MAF around 45% are problematic, especially AT and CG SNPs.

ADD COMMENT

Login before adding your answer.

Traffic: 2734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6