Question

How to decide how broken a gene is

4

Entering edit mode

9.1 years ago

entheologist33 ▴ 100

I need a way to assign a number to genes representing how broken it is depending on how many SNPs with minor alleles appear on the gene. Based on just a handful of SNPs though, not every SNP.

So lets say for example I know a few SNPs for COMT:

Rs165599     AA
Rs6269       AA
Rs737865     AA
Rs4633       TT
Rs769224     GG
Rs4680       AA

and some SNPs for STAT3:

Rs744166      GG
Rs8069645     AG
Rs9891119     CC
Rs6503691     CC

and I wanna find out which gene is more defective (as in which has the most diminshed ability to encode the corresponding protein), how can I do that? These genes obviously have way more SNPs than just those ones, and in every gene I will have many minor alleles so I'm guessing its only certain SNPs that cause problems for the gene. So could I find a number that tells me how much a bad allele of a particular SNP breaks the gene and then use that number to get an overall idea of how malfunctioning the gene is?

gene SNP • 3.0k views

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.1 years ago by entheologist33 ▴ 100

1

Entering edit mode

Thanks for the answers, I didn't know about VEP, SIFT and PolyPhen. I tend to use every one of those words you listed. I'm not so good with euphemisms but point received, have to be more considerate with my terminology :) So I see that VEP tells you what type of variant it is, so I'm guessing with nonsense variants you can assume the genes function is gonna be greatly altered. I couldn't figure out if they assign a number as an estimate of how greatly the genes function is altered by the allele. Do they actually have that?

Also is there a database which has these kinds of values determined through scientific studies (as opposed to computational methods which I'm assuming VEP uses)? And I wonder how this would work with multiple SNPs, would you add the values together? Like for example lets say in my COMT gene I have a minor allele on Rs165599 which is known to reduce the genes ability to produce COMT by 10%. Then on Rs6269 I have two minor alleles which reduce the genes transcription abilities by 20% for each allele? Would this add up to 10% + 20% + 20% = 50% meaning that the gene now only produces half as much COMT as a gene without these variants? I know things are gonna be way more complex than this, but could you make a rough estimate using this logic?

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 9.1 years ago by entheologist33 ▴ 100

3

Entering edit mode

Unfortunately it doesn't work like this for two main reasons:

The first is that going from genomic code to phenotype is a huge huge problem, of which humanity is no where near solving (if it ever will). The programs mentioned, although very good at what they do all things considered, can not be used as "evidence" for anything because the scores they give are almost meaningless. Putting sequencing data through them where the causative variant is known, SIFT/PolyPhen either calls the variant right at the top (if it's a premature stop), or right at the bottom (if it's basically anything else). This is because there are lots of things that can go wrong in mapping that result in things that look like frameshifts, and there are plenty of non-synonymous variants which do nothing but look scary - and the true variant is a change in some promotor/enhancer site, not even registered by these programs as a problem (possibly because the site isn't even known). Because there is so much noise in this data, adding up scores for genes really just adds up noise - diluting out the real instances of true signal.

The second problem - which is really the first problem from a different angle - is that genetics doesn't work like this.

As far as evolution is concerned, two wrongs can make a right - for example deleting and then later adding a base could be counted as two very serious frameshift mutations; but might have absolutely no effect on the amino-acid sequence at all.

Alternatively, a premature stop mutation in one gene might be irrelevant since there are 5 other backup copies. Things like copy-number-variations may not be visible via sequencing at all, but have an obviously huge effect on the individual's phenotype. So basically, as I think I said in the other thread h.mon linked, these tools do a good job at trying to reduce some of this complexity - but these days you cannot publish a paper because SIFT gave it a really high score. You need direct evidence, because when you go fishing with high quality bait you'll always catch some kind of fish.

Also is there a database which has these kinds of values determined through scientific studies

Well theres OMIM - Online Mendelian Inheritance in Man - but its kind of old. The guy who started it, Dr McKusick, was the PI of my old PI so im a little biased. There are probably other databases out there now that work with probability estimates and small effects being registered too, etc. I'd start by looking at all DbSNP has to offer by way of metadata.

...on Rs6269 I have two minor alleles which reduce the genes transcription abilities by 20% for each allele? Would this add up to 10% + 20% + 20% = 50% meaning that the gene now only produces half as much COMT as a gene without these variants?

No way - because of point 2 again, because a second change might rescue the phenotype, or might be unable to make it worse. Also, you have 2 of every gene - it would be important to know which allele the variant is on because a 20% reduction in one gene might mean a 120% boost in the other copy to compensate (often the case).

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 9.1 years ago by John 13k

1

Entering edit mode

In addition to what has been said, the number of SNPs with minor alleles in a gene may also depend on how different the ethnicity of your sample is from the reference sequence.

ADD REPLY • link 9.1 years ago by rbagnall ★ 1.8k

Ram · Accepted Answer · 2015-12-06

You're looking for VEP and/or Clinvar 2) Be careful with your terminology. Evolutionary Biologists might argue SNPs do not break genes - they create them. But more importantly, people who carry variants (with or without a deleterious phenotype) don't like being called "broken", "bad", "defective", "diminished", "problematic" or "malfunctioning" ;-) But before you blush, don't sweat it - we're taught to think this way in school and with good intentions, but it lands you in awkward situations like a presentation I once sat through re. how "defective individuals" carry the Sickle Cell gene, and a girl in the class had to leave because her mother suffers (she was therefor a carrier) and the whole thing was, rightly, upsetting.

Ram · Accepted Answer · 2015-12-07

In addition to what has already been suggested there is also CADD and RVIS that you should check out, CADD is much like SIFT and PolyPhen, RVIS is a statistic calculated that estimates how tolerant/intolerant genes are to mutations, which is probably very important to take into consideration for what you are looking in to.

The other thing to keep in mind is that not all mutations, even rare ones, will necessarily be detrimental to the protein's function. Some will have minor impacts, many will be essentially neutral in effect, and some may even enhance the protein's function. These are all important concepts from molecular evolution that aren't taken into account as often as they should be by many human geneticists.

score 1 · Accepted Answer · 2015-12-06

1

Entering edit mode

9.1 years ago

h.mon 35k

There is a number of suggestions on this thread.

ADD COMMENT • link 9.1 years ago by h.mon 35k