Mapping Snps To Pathways
12
13
Entering edit mode
14.7 years ago

Hi all, given a set of SNPs, what would be your favorite way to find their related pathways/ diseases ?

Thanks

snp genotyping pathway gene enrichment • 18k views
ADD COMMENT
0
Entering edit mode

what does it means that a snp is related to a pathway?

ADD REPLY
0
Entering edit mode

@giovani e.g. "this subset of snps (localized on gene G1,G2,...) have been described to be involved in the metabolism of 'X'".

ADD REPLY
16
Entering edit mode
14.6 years ago

Direct mapping of SNPs to a particular disease / pathways seems to be trivial, but from a practical perspective it is a tough task. Various SNPs are associated via GWAS with different phenotypes, a good number of these SNPs are not with in the genes or genomic elements, but that doesn't meant that these SNPs don't have any role in a pathway / disease responsible for the phenotype. The study on 9p21 locus is an excellent example. List of SNPs associated with diseases/traits via GWAS is maintained here.

There are chances that a given SNP in a non-coding region may have effects on neighboring genes, but ID mapping usually miss this. I think a direct mapping of IDs may not be able to give you accurate results with all SNPs. If the genomic location of SNP is with in the coding segment of the gene, it makes sense other wise a direct mapping may not give you exact results, but they could be the excellent starting points.

ADD COMMENT
1
Entering edit mode

I means SNPs out side the coding region / present in the non-coding regions. A recent review in Natrue Reviews Geneticsnature.com/nrg/journal/v11/n8/abs/nrg2814.html will be a good start to understand non-coding regions in the genome.

ADD REPLY
1
Entering edit mode

11 months later, I'm validating the answer with the highest score :-)

ADD REPLY
0
Entering edit mode

It's been a while since I've dusted off my biology: What do you mean by "a good number of these SNPs are not with in the genes"?

ADD REPLY
0
Entering edit mode

Thanks for the link associating disease w/ SNPs. Is this basically what companies like 23andMe use?

ADD REPLY
0
Entering edit mode

I means SNPs out side present in the non-coding regions. A recent review in Natrue Reviews Genetics(http://www.nature.com/nrg/journal/v11/n8/abs/nrg2814.html) will be a good article to understand non-coding regions in the genome.

ADD REPLY
5
Entering edit mode
14.7 years ago
David Nusinow ▴ 260

I haven't used it myself, but GRAIL was built for this sort of problem in GWAS. It looks pretty impressive from what I've seen.

ADD COMMENT
5
Entering edit mode
14.2 years ago

This is not an easy question because it calls to mind a lot of different ways to consider SNPs. For me, simply mapping the SNP to the gene in which it resides or that gene nearby can be misguided. Take for example the variants linked to lactase persistence in Whites and some Africans. These variants are 10 to 11 kbp upstream of the pertinent gene LCT (lactase), but actually map within MCM6 (minichromosome maintenance complex component 6). As an aside which pertains to my line of work - this is important stuff when drawing up dietary recommendations. Gene ontology terms for LCT are:

  • Molecular Function: cation binding, glycosylceramidase activity, lactase activity, transferase activity
  • Biological Process: carbohydrate metabolic process, response to drug, response to estrogen stimulus, response to ethanol, response to hormone stimulus, response to hypoxia, response to lead ion, response to nickel ion, response to nutrient, response to starvation, response to sucrose stimulus
  • Cellular Component: apical plasma membrane, brush border, integral to plasma membrane, membrane fraction, plasma membrane

While the GO terms for MCM6 clearly indicate a different function of the encoded protein:

  • Molecular Function: ATP binding, DNA binding, DNA helicase activity, identical protein binding, nucleotide binding, protein binding, single-stranded DNA binding
  • Biological Process: DNA replication, DNA unwinding involved in replication, DNA-dependent DNA replication initiation, cell cycle, regulation of transcription
  • Cellular Component: nucleoplasm, nucleus

OK, we know from a lot of other evidence that the SNPs conferring lactase persistence would "map" or be assigned to a lactase pathway. But where to assign other SNPs? Khader is right, mapping to disease pathways based on GWAS results is one option, but one may want more detail or assignment to a different pathway, e.g., biochemical, physiological, etc. In essence, this comes down to allele-specific pathways and pathway fluxes (different alleles for one SNP may alter transit through that node in the pathway by a mere 10-25% and that could be significant over the years it takes to see the phenotypic effects of a diseae). Few such pathways or pathway fragments exist. It also brings up cell type or organ specific pathways. In this regard, I may be able to call up from KEGG, Reactome or other sources a list of inflammation genes, which would be quite important as adipose tissue in a lean individual is 10% macrophages, but 40% in an obese person, but I do not know which members of that inflammation pathway are actually relevant and expressed in the adipose.

In addition, a recent paper by Folkersen (Circ Cardiovasc Genet 3:365) shows that many disease SNPs for cardiovascular disease phenotypes map far from the gene whose mRNA levels associate with that SNP. Again, it is a gene expression thing similar to the LCT-MCM6 story above. In all, this is tough and there is no satisfactory way to assign a SNP to a pathway. Assignment can be easier based on genetics - GWAS and classical mapping and mouse KOs - but those too may be population specific or altered by environment.

ADD COMMENT
4
Entering edit mode
14.7 years ago
Michael 55k

There are actually two questions in

related pathways/ diseases?

The first first part can be solved by database queries such as biomart and KEGG, but the second part is about complex studies. Actually, IMHO, a large part of the already known SNPs are not connected to disease, they might not even have a phenotype (I would bet >99%) . As far as I understand, the known SNPs are sampled from "healthy" individuals and represent a large mix. So it seems likely to assume that they are not easily connected to diseases.

In short, the answer might be exome sequencing of affected individuals. I found this recent article which I think is really great to answer this question:

Ng SB, et al., Exome sequencing identifies the cause of a mendelian disorder Nat Genet. 2010 Jan;42(1):30-5. Epub 2009 Nov 13.

In short they discovered point mutations common in few affected individuals and subtracted synonymously coding SNPs and already known SNPs until they retained only one gene.

ADD COMMENT
0
Entering edit mode

Exome sequencing has clear utility for familial (Mendelian) disorders, where it has become the first-choice method for identifying causative variants. However, the targets for GWAS studies are usually common variants, which by definition will not cause the rare highly penetrant heritable risk. Many methods will be required to identify all of the heritable risk.

ADD REPLY
3
Entering edit mode
14.7 years ago
Manuel Corpas ▴ 650

I would use DAS -- Distributed Annotated System to retrieve all genes/phenotypes associated to a specific SNP.

DAS is a webservice for decentralised annotation that provides an esy protocol to retrieve features providing an url.

For example, retrieve me all OMIM genes in chromosome 18 between base pair 1 and 1000000

http://das.sanger.ac.uk/das/ens_36_omim_genes/features?segment=18:11000000

More on DAS here

ADD COMMENT
1
Entering edit mode

Thanks but your system just finds the genes in a given region ( To do this i would simply use the UCSC mysql anonymous server with 'select distinct G.name from knownGenes as G, snp130 as S where G.txtStart<= S.chromStart and G.txtEnd>=S.chromEnd and S.name in("rs1","rs2"...)'). Here I want to mine the pathways and/or the diseases. For example: "this subset of SNPs is involved in the metabolism of XXXX".

ADD REPLY
3
Entering edit mode
14.7 years ago
Andrew Su 4.9k

Biomart's Martview (http://www.biomart.org/biomart/martview/) will get you from SNP IDs to many gene/protein identifiers. In a second step, Martview will also get you from gene IDs to GO Biological Process terms, but there are probably better tools that are specifically targeted toward pathways (KEGG, Reactome, WikiPathways, etc.)

ADD COMMENT
3
Entering edit mode
14.7 years ago

Pierre,

As soon as you get the Entrez gene Id related to your SNPs you can query KEGG or WikiPathways that should provide Entrez gene Ids related to a given pathway. The good think with this two websites is that with some SVG you can customized the graphic view of the pathways in order to highlight genes that have the SNPs. Hope this helps.

Fred

ADD COMMENT
2
Entering edit mode
14.2 years ago
jvijai ★ 1.2k

There are several GSEA methodology implementations..

  1. GSEA Mootha et al.
  2. GSEA Wang et al.
  3. MAGENTA Segre et al.
  4. VEGAS Liu et al.
  5. ALIGATOR Holmes et al.
  6. GRASS Lin et al.
ADD COMMENT
1
Entering edit mode

Vijai: Please merge your answers in to one. Also you may provide links to the manuscripts.

ADD REPLY
2
Entering edit mode
14.2 years ago
Austinlew ▴ 310

http://www.openbioinformatics.org/gengen/ please have a look at this program, it includes the pathway analysis.

ADD COMMENT
2
Entering edit mode
14.2 years ago
Paul Shapiro ▴ 20

I believe it best to describe individual SNPs in ALL/every which way imaginable: map location, gene centric (in cds or 5 kb upstream from this ORF etc), pathway involvement (if known) and finally disease/phenotype involvement (if known).

The next level of complexity arises when one wants to describe SNPs whose penetrance is modified by other factors (genetic or epigenetic), but maybe beyond the scope of this discussion.

ADD COMMENT
0
Entering edit mode

Indeed, there are many annotations one can add to a SNP or its alleles.

ADD REPLY
2
Entering edit mode
13.8 years ago
Vova Naumov ▴ 220

Gene Set Analysis Toolkit V2 http://bioinfo.vanderbilt.edu/webgestalt/

You can just upload txt file containing list of rs, one rs per line.

As result it gives KEGG_Pathway's or WikiPathway's with colored genes\proteins sorted by count of genes that each of them contains.

ADD COMMENT
1
Entering edit mode
14.2 years ago
jvijai ★ 1.2k

I have used ALIGATOR (pdf here) to find pathway enrichment from GWAS SNP data. It tests Gene-Ontologies over overrepresented categories from SNP p-values. Program link

GRASS is another ridge regression method that uses SNP data to find pathway enrichment. I believe a new R package is out for this now.

I am not sure if this is what Pierre is looking for..

ADD COMMENT

Login before adding your answer.

Traffic: 2116 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6