Analysis Of Snps In Gene Deserts
5
8
Entering edit mode
14.1 years ago

I have identified a causal multiple SNPs in gene deserts for my phenotype of interest.

What will be the possible bioinformatics / statistical genetics / experimental analysis approach / methods that I can explore to associate this SNPs in gene desert with my phenotype ?

Please share your thoughts and related literature on analyzing such gene deserts and SNPs in gene deserts.

snp annotation gwas • 5.9k views
ADD COMMENT
3
Entering edit mode

I feel it is better to call the SNP associative than causal. Causal seems more definitive reg: fxn

ADD REPLY
0
Entering edit mode

When you say causal along with non-genic, it is not clear. Is there a specific functional motif that is changed by the SNP or is it associative in phenotype?

ADD REPLY
0
Entering edit mode

I mean SNP is in a region of a genome where no known gene is annotated. Gene desert / non-genic region are generally used for such regions for example: http://genome.cshlp.org/content/20/9/1191.full http://genome.cshlp.org/content/15/1/137.full

ADD REPLY
0
Entering edit mode

I called it as causal, because of it's significant p-value and OR.

ADD REPLY
0
Entering edit mode

So correlation == causation these days?

ADD REPLY
0
Entering edit mode

@Vijai / Adrian: Question edited.

ADD REPLY
0
Entering edit mode

@Vijai / Adrian / Aaron : Question edited.

ADD REPLY
5
Entering edit mode
14.1 years ago

You and everyone else doing GWAS....

In any case, besides looking for ncRNAs and undocumented transcripts, there is a load of data from the ENCODE and related projects available most accessibly from the UCSC genome browser. In particular, you could look at overlap between DNAse hypersensitivity, transcription factor binding, maximally conserved elements, etc. and your SNP. If you want to get really crazy, you could look at the Hi-C paper by Dekker et al. that purports to give long-range genomic interactions between various regions of the genome.

ADD COMMENT
0
Entering edit mode

Thanks a lot Sean. I will explore the ENCODE data and DNAse hypersensitivity tracks from UCSC genome browser with Hi-C data. Quick search shows that Hi-C associated resource don't have a way to browse the data. Maximally conserved elements as indicated in Vista (http://genome.lbl.gov/vista/) and related resources for comparative genomics ?

I am already running a TFBS search in the region.

ADD REPLY
0
Entering edit mode

Thanks a lot Sean. I will explore the ENCODE data and DNAse hypersensitivity tracks from UCSC genome browser & Hi-C data. Quick search shows that Hi-C associated resource don't have a way to browse the data. Maximally conserved elements as indicated in Vista genome.lbl.gov/vista and related resources for comparative genomics ? I am already running a TFBS search in the region

ADD REPLY
0
Entering edit mode

There are several conservation tracks at UCSC, including the phastConsElements tables. As for TFBS, you could also look at the tfbsConsSites table. My point in the answer above was that there are other genomic features besides genes that might be of interest. And, unfortunately, as you note with the Hi-C data, accessing them may not always be straightforward.

ADD REPLY
0
Entering edit mode

Thanks Sean. I am currently exploring ENCODE data and other tracks in UCSC.

ADD REPLY
3
Entering edit mode
14.1 years ago
Gww ★ 2.7k

Perhaps you could use some RNA-mapping technique to look for non-coding RNA's containing the SNP you are interested in. Perhaps 3'- / 5'- RACE could be suitable. At least from this you could get a general idea if the SNP is associated with a transcript. However, your SNP could be part of some long range cis-enhancer region like they found in this paper, which could make experimental validation more complicated.

You could also try mining non-coding RNA lists in various databases such as ensembl (ie. small RNA / lincRNA / etc) to see if any of them map to that region as well.

ADD COMMENT
0
Entering edit mode

Thanks GWW. I am exploring the ncRNA / miRNA track now.

ADD REPLY
2
Entering edit mode
14.1 years ago

Well, Khader, you should have been at ASHG last week - and should have then run from talk to talk where all kinds of folks presented very similar situations. The MYC paper GWW cites is an example that comes to mind as well. In fact, I saw talks by two of those authors. I also saw a presentation of using ENCODE data to annotate the GWAS hits - so I agree with Sean's excellent idea. My notes from these talks are on my blog: Wasserman, Degner's use of ENCODE data, and Stamatoyannopoulos' talk on using ENCODE.

This is just a start, though. There is certainly a lot of ideas on this topic. Generally, the SNPs in gene deserts are SNPs regulating either expression of a distant protein-coding gene or are in/near to a non-protein-coding (RNA) gene. Ideally, you would have expression data (either from a genome-wide chip of mRNA probes) or RT-PCR data across the region (in which you can identify those novel transcription products) to compare to the GWAS data to look for a type of eQTL.

Added on 12 Jul 2011: The continued evolution of the epigenomebrowser.org site makes it a very good place to go to get the data mentioned here. There are now some very nice displays of Stamatoyannopoulos' DNaseI hypersensitivity sites, tissue-specific histone methylation and acetylation marks as well as transcription factor ChIP data.

ADD COMMENT
0
Entering edit mode

True Larry, I should have attended ASHG. Thanks for your points. I had a cursory view at your blog posts during the #ashg2010 tweets. I will read them in detail now. Thanks for the point on expression data, I will check this, eQTL will be another interesting idea. Do you know about any curated/automated database of eQTL?

ADD REPLY
0
Entering edit mode

No, I know of no such database. It is unfortunate that these data end up in supplementary files that really need to be swept into a larger database. Creating such a database should be a priority of funding at NIH, IMHO.

ADD REPLY
0
Entering edit mode

Thanks Larry. You are right a literature curated eQTL database should be a priority for the funding agencies.

ADD REPLY
2
Entering edit mode
13.5 years ago
Adrian Cortes ▴ 550

In my opinion you have to be careful when you say "you have identified a causal SNP". From your description you have identified a region of high association with no transcript annotation, AKA gene dessert. This is still not a causal SNP as you don't know how this variation affects your phenotype and whether a variation in LD with your candidate is the true causal SNP.

As it was suggested in some of the answers already you can combine your genotypes with expression data, here is a nice example in the literature.

You can also take a look at this eQTL browser.

Cheers!

ADD COMMENT
1
Entering edit mode
14.1 years ago

Just to add a different point - one probably already known to Khader. This type of question is being considered at WikiGenes and the article here. Many of us (those into genome variation) either need to join this effort or keep abreast of what they write.

ADD COMMENT
0
Entering edit mode

Sure Larry. I would like to add few points that I learned from my interaction with BioStar, I hope it will be useful for those areinterested in post-GWAS analysis. I will be adding my ideas in the discussion page - See you at the Wikigenes page.

ADD REPLY

Login before adding your answer.

Traffic: 1913 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6