Question

SNP enrichment analysis - alternatives to BROAD SNPsnap?

1

Entering edit mode

10.2 years ago

epaminonda ▴ 10

Hello,

I am trying to carry out a SNP genomic enrichment analysis and I was hoping you could help.

Basically, I have the following two sets of SNPs:

set_A: 1,695 foreground SNPs. These are 1000g variants which, in addition, are QTL for a trait I'm interested in. They all are within ChIP-seq peak intervals for a TF.
set_B: 116,000 background SNPs. These are a superset of set_A and all are within ChIP-seq peaks for the same TF above. These represent all the SNPs I had tested for the QTL property above.

I want to determine whether set_A is enriched in some particular annotation compared to set_B. In other words, I want to know whether, compared to all SNPs tested for QTL in my ChIP-seq peaks, my set_A is enriched in some annotation. For example, this annotation might be strong LD intervals around GWAS genome wide significant SNPs from the GWAS catalog. Therefore I want to ask:

"Are my set_A variants more likely to be in GWAS LD blocks for some disease/trait compared to the background set of SNPs?"

I have ascertained already that set_A are MAF matched to set_B (bootstrapped KS test of the two MAF distributions), so this should not be a problem. I ran the GAT simulation-based enrichment tool: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3722528/

which works fine and has returned enrichment results. However, I believe my foreground and background sets need more pre-processing: there is LD structure both within set_A and within set_B. So some SNPs in A are in LD across them and some SNPs in B are in LD across them. I believe I need to correct for this, too, to avoid inflation of enrichment. I would probably need to LD-match set_A and set_B, or maybe pool or subsample independent SNPs only from set_A and set_B. The GAT, which is designed to compute simple interval enrichments, cannot do this.

There is a tool which might be able to help me, by the BROAD, called SNPsnap: http://www.broadinstitute.org/mpg/snpsnap/

Interestingly, SNPsnap should be able to carry out LD-clumping of the foreground SNPs, so it can correct for LD-derived inflation of enrichments. However, SNPsnap only returns a frequency matched background of (at most) 20.000 snps: I don't need this, because I believe I already have the most suitable background set (set_B) (and in any case I need my background snps to be in the ChIP-seq peaks).

Additionally, it seems SNPsnap is quite experimental (I have had about 80% of runs fail on me) and any mails to the authors go unanswered. So I believe the program is not really supported.

Therefore I was hoping anyone on here had ideas on how to do this:

LD clumping: what if I mapped my set_A snps to strong LD intervals and computed, instead of the enrichment of set_A snps in GWAS LD blocks, the enrichment of set_A LD blocks in GWAS LD blocks? Else, for each LD block containing more than 1 set_A SNP, I could select the "best" according to some metric? Any other ideas or suitable tools?

Thanks for any suggestions you might be willing to share.

ChIP-Seq SNP • 3.8k views

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 10.2 years ago by epaminonda ▴ 10