Question

How To Map Snps To Genes Based On Recombination Hotspots And Ld

6

Entering edit mode

13.6 years ago

Tafelplankje ▴ 120

Hi, I am looking for a way to map SNPs to genes, not only taking into account the position but also the recombination hotspots and LD. Using the following algorithm:

The Wingspan of SNP is defined as the region containing SNPs with r2>0.5 (hapmap/1000g) to the associated SNP is extended to the nearest recombination hotspot.
A Gene's residence is defined as 110kb upstream and 40kb downstream of the coding region of the gene's largest isoform (from ensemble for example).

-->Then the wingspan of the SNP is mapped/overlapped to the Gene's residence.

They use this method in the Dapple (this method is also used by http://www.broadinstitute.org/mpg/dapple/dapple.php), a pathway analysis tool. I have diffulties with the first part, especially extending the region to the nearest recombination hotspot.

What would be a good way to map SNPs to genes using the above mentioned method, how would you do this?

snp gene mapping linkage recombination • 6.2k views

ADD COMMENT • link updated 11.5 years ago by Biostar 20 • written 13.6 years ago by Tafelplankje ▴ 120

Ram · Answer 1 · 2011-11-08

3

Entering edit mode

13.6 years ago

Michael 55k

We have made a little tool in R that does this -- almost. Look at this answer to Larry's question.

What is lacking are the recombination hotspots as additional data to increase the boundaries, but the LD based binning with r2 cut-off on hapmap phase3 is there. I think, that it is good to start with LD based binning alone with a sensible cut-off. Using hotspots as boundaries seems an interesting idea, to me but isn't there the risk of overextending the SNP-gene assignments if you take so very large 'wingspan' regions, that you probably get when using these hotspots? This might be a problem especially for pathway analysis. Already r2 of 0.5 seems very low to me.

ADD COMMENT • link updated 5.7 years ago by Ram 45k • written 13.6 years ago by Michael 55k

0

Entering edit mode

I'm interested in this approach to assign loci using leadSNPs from gwas data. Often a gwas signal is right in between 2 recombination peaks.

ADD REPLY • link 13.6 years ago by Tafelplankje ▴ 120

0

Entering edit mode

maybe you can look at the R package, either it does what you want already, or you can suggest a modification, the code is there to look at. Just give me feedback, and we can try it out.

ADD REPLY • link 13.6 years ago by Michael 55k

0

Entering edit mode

I cannot find an option to annotate the snps to the genes in a simple way like: SNP-GENE. I only see the plink output option, but this not really usefull. Am i missing something? thank you!

ADD REPLY • link 13.6 years ago by Tafelplankje ▴ 120

0

Entering edit mode

If you use the option: scoring.function = "get.snps", this will generate a list of SNPs per gene separated by ';'. As we take a gene-centric approach, we compute assignments GENE -> SNPs or GENE -> p-values, while we focus on compund p-values, we don't output it the other way around. Though I guess it is quite easy to implement such an output format too. btw.: The plink output option is very useful (for us ;)) to run other plink methods on the SNP sets. If you want, contact us and specify a useful output format. I cannot promise anything, but we can try to build something.

ADD REPLY • link 13.6 years ago by Michael 55k

Ram · Answer 2 · 2011-11-08

0

Entering edit mode

13.6 years ago

Larry_Parnell 16k

Thanks, Michael, for the pointer. When I needed recombination hotspot data, I wrote to Gil McVean for those. More info can be found in my responses to three BioStar questions on LD and related topics: 5425, 8008, 12355.

ADD COMMENT • link updated 5.7 years ago by Ram 45k • written 13.6 years ago by Larry_Parnell 16k

score 0 · Answer 3 · 2011-11-29

0

Entering edit mode

13.5 years ago

Liz ▴ 10

In case you haven't found an answer this yet, just download SNP LD data, SNP position data and hotspot location data from www.hapmap.org. Use the LD data to grab SNPs within r2>.5, and then just crawl out to the nearest hotspots defined in the hotspots file. Alternatively, DAPPLE will do this for you :-)