Question

What protein structure do human proteins refer to?

0

Entering edit mode

9.8 years ago

jcrapser ▴ 10

Hello, I have a question about the semantics of different possible versions of proteins. In mice, a particular protein will have a given amino acid sequence, so when I talk about say Il-10 in C57/Bl6 mice it's referring to a specific primary structure.

My confusion is about the human correlate of this - different alleles of a gene can code for different versions of a protein with slightly different amino acid sequences. So when people refer to Il-10 (or another protein) in humans either on Wikipedia or clinical studies, what version of the protein are they talking about? Are they talking about the most common (wild-type) version of the protein or just all major polymorphisms assuming they don't differ much in function (as say APOE versions would)? You can look up a human protein on UniProt and it'll give you the amino acid length, where it's phosphorylated etc but is this for a particular version of the protein coded by a certain allele, or is this describing all variants of a given protein?

I guess more specifically, how can a study describe the structure of a given human protein (amino acid sequence, secondary structure etc.) if these proteins can exist as different versions through different alleles in a population?

I hope that makes sense - thanks in advance!

protein protein-structure • 3.7k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 9.8 years ago by jcrapser ▴ 10

score 3 · Answer 1 · 2015-11-16

In general, biologists have a fuzzy notion of what a gene is and when they use a gene name to refer to a protein it can mean any or all of the possible isoforms, including proteins produced by different loci. If you want to know which isoform a particular structure corresponds to, you have to look up this structure in the relevant database e.g. PDB. Also, many specialist databases have no underlying notion of genome e.g. Uniprot is only a database of proteins, that is, Uniprot gives you information about a particular protein sequence or its variants but knows nothing about the genome that produced it. If you want to know which allele corresponds to which protein, you need to look at annotated genomes like those in EnsEMBL.

score 1 · Answer 2 · 2015-11-16

To add what to Jean states:

Typically it is assumed that when two proteins share the same name they're orthologs, meaning they share the same function in each host. However "same" is a loaded word, differences in the proteins may lead to different behavior or the proteins may behave the same but there are differences in their neighborhood (e.g. one host may lack an interacting protein).

When it comes to polymorphism, unless the study specifically mentions it, you should assume that they're talking about the wildtype, and that they're assuming that any polymorphisms that exist are not important. This isn't always true, just look at the field of cancer, but it is the starting assumption.

As for mice, you can only more strongly rely on this assumption because the animals are super inbred. You can also only make that assumption for a single strain of mice, compare C57 with BALBc and you're going to get differences. Other model organisms are not as inbred or aren't inbred.

You should also take note that there's a major difference between alleles of a gene and isoforms of a gene. Alleles are a heritable unit, an isoform is a result of splicing. You can have alleles that result in differences in the possible isoforms, but for a given variant of a gene, the associated isoforms are a result of differential splicing.

In other words if w/t gene X lists isoforms X1, X2, X3, these isoforms are due to alternative splicing, not genetic differences, each isoform came from the same gene.

Now, as to why you can say "here is the structure of X": You can because that's what the community is calling X. It might seem arbitrary (it is), but in order to call variants, you need to have something to compare against. Additionally, if everyone were using different versions of a standard, people would get different results. This idea of a reference is important in a general sense, if everyone was working on a different flavor of IL10 and didn't know it, nothing would get done.

score 0 · Answer 3 · 2015-11-17

Okay - thanks for the help guys! So generally studies and textbooks will be referring to the "wild-type" version of the protein unless the explicitly state otherwise. So, for instance, a CD4 polymorphism exists in some populations, and while technically that is a CD4 protein, generally when people discuss CD4 function they will be referring to the wild-type version that most people carry?

One more question - some proteins are highly polymorphic (like MHC 2) - in which case, discussing MHC2 isn't referring to a canonical amino acid sequence but rather this molecule and all of its different polymorphs in an individual?

It gets confusing, during undergrad you're taught about all these proteins and their functions in classes and in textbooks, but they can exist in different amino acid sequences in humans so it's hard to understand whether they're referring to one single protein or a conceptual idea of a protein and its function that may differ based on primary structure.