Does allele refer to the entire gene sequence or one position of the gene sequence?
4
0
Entering edit mode
7.6 years ago

Hi all,

I am a computer science student. I need some clarifications in the below understandings

  • Two alleles represent a gene. Genes are pieces of DNA. DNA makes up chromosomes. Chromosomes are found inside the nucleus of cells. In diploid organism, the two corresponding genes in a chromosome pair (homologous chromosome) are referred as alleles. Each parent contributes one allele in each pair. So those two alleles might be identical or might have different base sequences

1) Does allele refer to the entire gene sequence or one position of the gene sequence?

2) Does the meaning of alleles in case1 and case 2 are different?

Case1: For example, the gene for height is considered. So the Tt (tall) is genotype, whereas T is one allele and t is another allele. Entire gene has two alleles.

enter image description here

Case2:

I have taken the screenshot from GATK VCF file. For example, at chr1 position 762,589 I have G allele in reference sequence and C allele in father and son. So the genotype for this position is G/C (heterozygous). Similarly, at other positions,

  • C/G genotype, C allele in reference, G allele in father and son
  • T/C genotype, T allele in reference, C allele in father and son
  • T/C genotype, T allele in reference, C allele in father and son
  • T/A genotype, T allele in reference, A allele in father and son

    Here we refer to allele as one base of a gene. There fore we have multiple alleles in single gene "GLA".

enter image description here

snps variant alleles • 6.8k views
ADD COMMENT
3
Entering edit mode
7.6 years ago
Emily 24k

In undergrad we were taught the classical definition. A gene that causes a fruitfly to have red eyes is one allele, whereas the version that gives it white eyes is another allele. This would be the definition that Mendel used, before the molecular basis of genetics was understood. This is often a definition used by people who don't really work in genetic variation or genomics, and have limited understanding of the kind of data we're using. It is basically obsolete.

Now that we understand the molecular basis of genetics (and have done for over 60 years), we say the alleles are the possible bases of a genetic variant. These can be single bases, in the case of a SNP, or many bases in the case of a structural variant. This is used consistently by biologists and bioinformaticians who work with genetic variation, and by all bioinformatic databases.

ADD COMMENT
0
Entering edit mode

Thank you Emily, I love it when things are put into historical context. That makes the process of (truly) understanding a complete one.

ADD REPLY
1
Entering edit mode
7.6 years ago

TL:DR

An allele is an abstract concept that is not well defined at the molecular level. Consequently you will encounter people who use it to mean either the whole sequence or the specific base. Both cases are informal usage; I lean more towards the whole sequence of the gene, but you will hear both.

What is a gene

From a formal genetics point of view technically speaking a "gene" is an abstract concept defined as a unit of inheritance: something that obeys Mendel's laws. Alleles are opposing versions of these units.

For most practical purposes we treat genes as synonymous with stretches of DNA that code for a set of overlapping RNA transcripts of similar sequence and function (technically a locus). This is because generally these stretches of DNA are inherited together. Exactly how this stretch of DNA is determined is not well defined and varies from project to project. But technically speaking it is wrong to say that that stretch of DNA is the gene. Of course we all do it all the time because it is convenient.

In some ways, the concept of a gene, or at least its equivalence to a stretch of DNA, is an outdated out because we know that recombination can happen within any stretch of DNA, and thus what we refer to as a gene will not necessarily be inherited together.

A computer science analogy for a gene

A computer science analogy might be: A class is an abstract concept. In many languages, an instance of a class and all its members may reside at a particular location in memory, but we would not say that that location WAS the class.

Alleles

The difference between two alleles is caused by changes to the base sequence of the DNA. But an "allele" is neither a version of the changed base itself, nor the locus we are saying is equivalent to the a gene, but the abstract gene itself: it is the B or the b in your punnet square.

Now all this isn't to say that your should never talk about alleles as being either the particular base, nor the whole gene sequence, but rather that different people will use both at different times - at the level of DNA, the concept of an allele is ill defined.

ADD COMMENT
0
Entering edit mode
7.6 years ago
jotan ★ 1.3k

"1) Does allele refer to the entire gene sequence or one position of the gene sequence?"

The short answer is: An allele is a complete gene sequence.

"Here we refer to allele as one base of a gene."

That is an incorrect (or highly contentious) definition of an allele. One base change in a gene is a SNP. The entire gene sequence is an allele.

My long answer is that anyone who is interested in "bioinformatics" should try to give equal weighting to training in both "bio" and "informatics". When people start from a "biology" base, the most common advice is that they should take a basic computing course and learn some formalised computing. I think the inverse is equally true and people starting from a "computing" base should try to take a formal course in biology, in this case, genetics and evolution.

That's not meant as a criticism for the OP, just my general frustration at reading loads of 'genomics' papers where the authors clearly do not understand how genes function or how evolution works, and wild conclusions are reached because of 'statistical significance'.

ADD COMMENT
2
Entering edit mode

BSc Genetics, PhD Molecular Biology here. The definition of alleles being version of a gene became obsolete when the molecular basis of genetics was discovered. Alleles refer to the possible bases of a variant.

ADD REPLY
0
Entering edit mode

I was a little too hasty in my reply. On closer reflection, I don't actually define an allele as a whole gene. I think my definition is more like "sequence variants in a population within a given window (>1bp) of DNA sequence".

I personally think it's silly to refer to a single base pair as an allele (as in the 2nd example above).

1) Because we already have a term which is widely understood (SNP).

2) Because it fails to take into account linkage and recombination.

Two (or more) SNPs located close together will almost always be inherited together, and quite often become fixed in a population together. For this reason, I think it's a mistake to annotate individual SNPs a as completely separate "alleles". Why should each one be designated as an "allele" if some combinations are always found together?

Word definition can be tricky. Experts reading the word in context will usually understand what the writer is trying to convey but that only comes from a lot of exposure. Hence, a course in genetics.

But if experts cannot agree about definition of a technical term, it may be time to retire the term.

ADD REPLY
2
Entering edit mode

Firstly, the term "SNP" refers to the locus. Secondly, not all variants are SNPs, some are indels and some are structural variants.

Sets of alleles of variants in LD with each other is referred to as a haplotype.

The fact is, you can argue until you're blue in the face about how a definition should be used in a certain way, and about how all the people using it in another way are wrong, but if the majority are using it that way then you just have to go with the flow.

ADD REPLY
0
Entering edit mode

I don't want to argue, and I don't even necessarily disagree with you but:

"The fact is, you can argue until you're blue in the face about how a definition should be used in a certain way, and about how all the people using it in another way are wrong, but if the majority are using it that way then you just have to go with the flow."

The first google hit for "Allele" is a Nature education website where an allele is described as a "variant form of a gene."

If we work on a 'definition by majority', then an allele is a genic variant.

In any case, if we are all using the same word to mean different things, do you think there might be a problem? If a "bioinformatician" uses a word to refer to one thing, and a "biochemist" understands a different thing, I don't think anybody wins. Just like no one wins by continuing this discussion :)

ADD REPLY
0
Entering edit mode

I agree. Sort of.

For example, the term that I don't really think makes sense these days in animal genetics is "gene". The complex nature of our modern understanding of inheritance means that genes are originally defined don't exist. There are things we reffer to as genes today, but they aren't the same as what was originally meant by the word and what counts as a gene is pretty arbitrary. Sometimes overlapping transcripts are called genes, - but this is only exonic overlap. But which parts of genes are exons really is just a case of how hard you look. Definitions involving proteins don't work because what about lincRNAs. And why are enhancers not called genes? They are stretches of DNA that code for phenotypes and are inherited independently.

But Gene is a useful casual term, even if its hard to define in a strict sense. I think the same is probably true of the word "allele".

ADD REPLY
2
Entering edit mode

In Ensembl we use gene to refer to a genomic locus where transcription occurs. If the transcripts share exons, they are part of the same gene (although there are counter-examples such as readthrough transcripts where we would not define them as the same gene). This refers to transcription only, and not translation, so UTR regions and non-coding transcripts are included in this definition.

You're right, the old-school definition is a unit of inheritance. This is again obsolete (ie used by people who don't really work in the field). You also have the terrible lay-person definition whereby a gene is a thing that causes a disease (ie Daily Mail headline "Gene for obesity found", in the text we see "people with the gene are fat").

ADD REPLY
0
Entering edit mode

To complicate things: a SNP has two alleles. So it's not exclusively on the gene level.

I fully agree about the second part of your answer. Asking a question is no substitute for picking up a few biology and genetic books. Understanding of these principles is crucial in bioinformatics.

ADD REPLY
0
Entering edit mode
7.6 years ago
William ★ 5.3k

Next to picking up a biology / genomics / genetics intro book I can recommend to look at the computational definition of VariantContext, GenotypeContext, Genotype and Allele in the public HTS-JDK library that is used by many bio-informatics tools.

These are in hierarchical order, so start with VariantContext. This order is analog to the the biological order of concepts that you need to understand:

VariantContext:

https://github.com/samtools/htsjdk/blob/fbba5364e1809de071bc479f30e4e2c8b17f5bbe/src/main/java/htsjdk/variant/variantcontext/VariantContext.java

GenotypeContext:

https://github.com/samtools/htsjdk/blob/fbba5364e1809de071bc479f30e4e2c8b17f5bbe/src/main/java/htsjdk/variant/variantcontext/GenotypesContext.java

Genotype:

https://github.com/samtools/htsjdk/blob/fbba5364e1809de071bc479f30e4e2c8b17f5bbe/src/main/java/htsjdk/variant/variantcontext/GenotypesContext.java

Alllele:

https://github.com/samtools/htsjdk/blob/912c28bec415c430b43515652ccaf13222b07e7b/src/main/java/htsjdk/variant/variantcontext/Allele.java

ADD COMMENT

Login before adding your answer.

Traffic: 2508 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6