How To Map Snps To Genes Based On Recombination Hotspots And Ld
3
6
Entering edit mode
13.1 years ago
Tafelplankje ▴ 120

Hi, I am looking for a way to map SNPs to genes, not only taking into account the position but also the recombination hotspots and LD. Using the following algorithm:

  1. The Wingspan of SNP is defined as the region containing SNPs with r2>0.5 (hapmap/1000g) to the associated SNP is extended to the nearest recombination hotspot.
  2. A Gene's residence is defined as 110kb upstream and 40kb downstream of the coding region of the gene's largest isoform (from ensemble for example).

-->Then the wingspan of the SNP is mapped/overlapped to the Gene's residence.

They use this method in the Dapple (this method is also used by http://www.broadinstitute.org/mpg/dapple/dapple.php), a pathway analysis tool. I have diffulties with the first part, especially extending the region to the nearest recombination hotspot.

What would be a good way to map SNPs to genes using the above mentioned method, how would you do this?

snp gene mapping linkage recombination • 5.7k views
ADD COMMENT
3
Entering edit mode
13.1 years ago
Michael 55k

We have made a little tool in R that does this -- almost. Look at this answer to Larry's question.

What is lacking are the recombination hotspots as additional data to increase the boundaries, but the LD based binning with r2 cut-off on hapmap phase3 is there. I think, that it is good to start with LD based binning alone with a sensible cut-off. Using hotspots as boundaries seems an interesting idea, to me but isn't there the risk of overextending the SNP-gene assignments if you take so very large 'wingspan' regions, that you probably get when using these hotspots? This might be a problem especially for pathway analysis. Already r2 of 0.5 seems very low to me.

ADD COMMENT
0
Entering edit mode

I'm interested in this approach to assign loci using leadSNPs from gwas data. Often a gwas signal is right in between 2 recombination peaks.

ADD REPLY
0
Entering edit mode

maybe you can look at the R package, either it does what you want already, or you can suggest a modification, the code is there to look at. Just give me feedback, and we can try it out.

ADD REPLY
0
Entering edit mode

I cannot find an option to annotate the snps to the genes in a simple way like: SNP-GENE. I only see the plink output option, but this not really usefull. Am i missing something? thank you!

ADD REPLY
0
Entering edit mode

If you use the option: scoring.function = "get.snps", this will generate a list of SNPs per gene separated by ';'. As we take a gene-centric approach, we compute assignments GENE -> SNPs or GENE -> p-values, while we focus on compund p-values, we don't output it the other way around. Though I guess it is quite easy to implement such an output format too. btw.: The plink output option is very useful (for us ;)) to run other plink methods on the SNP sets. If you want, contact us and specify a useful output format. I cannot promise anything, but we can try to build something.

ADD REPLY
0
Entering edit mode
13.1 years ago

Thanks, Michael, for the pointer. When I needed recombination hotspot data, I wrote to Gil McVean for those. More info can be found in my responses to three BioStar questions on LD and related topics: 5425, 8008, 12355.

ADD COMMENT
0
Entering edit mode
13.0 years ago
Liz ▴ 10

In case you haven't found an answer this yet, just download SNP LD data, SNP position data and hotspot location data from www.hapmap.org. Use the LD data to grab SNPs within r2>.5, and then just crawl out to the nearest hotspots defined in the hotspots file. Alternatively, DAPPLE will do this for you :-)

ADD COMMENT
1
Entering edit mode

'just', yes sounds so simple xD

ADD REPLY

Login before adding your answer.

Traffic: 2989 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6