Hi, I am looking for a way to map SNPs to genes, not only taking into account the position but also the recombination hotspots and LD. Using the following algorithm:
- The Wingspan of SNP is defined as the region containing SNPs with r2>0.5 (hapmap/1000g) to the associated SNP is extended to the nearest recombination hotspot.
- A Gene's residence is defined as 110kb upstream and 40kb downstream of the coding region of the gene's largest isoform (from ensemble for example).
-->Then the wingspan of the SNP is mapped/overlapped to the Gene's residence.
They use this method in the Dapple (this method is also used by http://www.broadinstitute.org/mpg/dapple/dapple.php), a pathway analysis tool. I have diffulties with the first part, especially extending the region to the nearest recombination hotspot.
What would be a good way to map SNPs to genes using the above mentioned method, how would you do this?
I'm interested in this approach to assign loci using leadSNPs from gwas data. Often a gwas signal is right in between 2 recombination peaks.
maybe you can look at the R package, either it does what you want already, or you can suggest a modification, the code is there to look at. Just give me feedback, and we can try it out.
I cannot find an option to annotate the snps to the genes in a simple way like: SNP-GENE. I only see the plink output option, but this not really usefull. Am i missing something? thank you!
If you use the option: scoring.function = "get.snps", this will generate a list of SNPs per gene separated by ';'. As we take a gene-centric approach, we compute assignments GENE -> SNPs or GENE -> p-values, while we focus on compund p-values, we don't output it the other way around. Though I guess it is quite easy to implement such an output format too. btw.: The plink output option is very useful (for us ;)) to run other plink methods on the SNP sets. If you want, contact us and specify a useful output format. I cannot promise anything, but we can try to build something.