What Is The Definition Of Gene Cluster
6
4
Entering edit mode
13.5 years ago
Free Man ▴ 180

I had read same papers, but I met a simple question:
What is the definition of gene cluster (in the level of genes distribute on chromosomes)?
(papers: Large clusters of co-expressed genes in the Drosophila genome, Clustering of housekeeping genes provides a unified model of gene order in the human genome)
Could anyone give me a example about how to calculate a gene cluster?

It seems they define a cluster as physically adjacent genes, but the distance between two genes may vary a great deal across the genome. So what is the permissible distance between two genes in a cluster?
Thank you!

gene • 16k views
ADD COMMENT
4
Entering edit mode
13.5 years ago

Already nice answers, I would like to share my view on this.

A 'gene cluster' is more of a semantic definition than a well-defined concept. IMHO, gene cluster is more of a conceptual frame-work to define a group of genes that share a common phenomenon. This common phenomenon could be:

  • Function (See an example here)
  • Pathway (See examples here - 1, and here )
  • Co-localization / Physical location (See an example here)
  • Co-expression (See an example here )
  • etc ...
ADD COMMENT
2
Entering edit mode
13.5 years ago

I don't think there is an official definition of a cluster size in this sense. If someone reports, "we identified a cluster of genes that influence hat size" then I'd expect to find those genes somewhat near each other on the same chromosome. However, there's no reason why three genes (HAT1, HAT2, and HAT3, of course) in a two megabases interval would be a bona fide cluster, while three genes in a five megabase interval is no longer a cluster.

Genes in a gene cluster typically encode similar products, perhaps as the result of a gene duplication, and this implies that cluster is defined more through common ancestry than the interval between genes. See the wikipedia entry for for a brief overview. Using this definition, you would identify clusters primarily using sequence similarity rather than interval size.

I think I've seen the term used for genes that are purported to share a common function (perhaps they are co-amplified in tumors and are both oncogenes, for example), but this may be an abuse of the term.

ADD COMMENT
0
Entering edit mode

I think you are talking about "gene family". I remember in <[?]> there is tip about "clustering of gene family(homology) and adjacent genes(distribution on chromosomes) are different". But I feel confused now...

ADD REPLY
2
Entering edit mode
13.5 years ago
Neilfws 49k

For the paper that you describe, it's the number of genes in the cluster rather than the distance between them which the authors believe to be significant. The key part of the paper reads as follows:

To investigate whether the observed distribution of genes differed from a random distribution, we generated a model of stochastic distribution using a random number generator. EST profiling identified 1,661 testes-specific genes, as described above. The stochastic distribution was generated by producing 1,661 random, non-repetitive numbers in the range of 1–13,290 (the total number of genes in the Drosophila genome). With the assumption that the row of numbers from 1 to 13,290 comprises the order of genes in the genome of D. melanogaster, each iteration therefore assigns random genomic positions to the 1,661 testes-specific genes. The proportion of genes found in clusters and the size distribution of clusters were calculated, and the values were averaged for 50 reiterations.

So what they are asking is: are tissue-specific (and hence, likely co-regulated) transcripts more likely to be located next to each other on the chromosome. They conclude that:

Although the number of two-gene clusters in the EPD prediction was only 6% higher than expected by chance, there were 2.5 times more three-gene clusters and five times more four-gene clusters in the EPD as compared with the STD prediction.

This may suggest a regulatory feature on the chromosome common to all genes in a cluster.

ADD COMMENT
0
Entering edit mode

I know the main idea of the paper, but I do not know how do they define a cluster.For example, a cluster of 3 genes, why these genes are in a cluster?

ADD REPLY
0
Entering edit mode

Because they are expressed specifically in testis and they are located next to each other. There's nothing more to it than that, in the case of this paper.

ADD REPLY
0
Entering edit mode

So how to define two genes are located next to each other? (sorry to troblue you but I can't contact the authors...)

ADD REPLY
0
Entering edit mode

They are not "defined as next to each other" - they just are next to each other. There is no distance measure. There is no clustering algorithm. In this paper, "a cluster" simply means "2 or more adjacent genes which are expressed specifically in testis." At least some of these cases are presumed to be "real" clusters because this situation occurs more often than "by chance". That's it. Nothing more complicated than that.

ADD REPLY
2
Entering edit mode
13.5 years ago
K_Star ▴ 120

‘gene cluster’ or ‘gene neighbourhood’ = genomic region that exhibits co-ordinated regulation of multiple genes (Lercher, et al., 2002,Nature Genetics 2002,31 (2):180; Shopland, et al., 2003, J. Cell Biol. 2003;162 (6):981-90.)

ADD COMMENT
2
Entering edit mode
13.5 years ago

Before you drink the Kool-aid on the gene expression neighborhood literature and worry too much about the definition of what is likely an artefact of weak signals in large datasets, transcriptional noise, and a sociological will for this phenomenon to be true, please have a look at Meadows et al (2010) "Neighbourhood Continuity Is Not Required for Correct Testis Gene Expression in Drosophila". These authors perform the crucial experiment of disrupting putative gene expression neighborhoods using targeted genome rearrangements and show that the "linear organisation of genes in a neighbourhood is not necessary for correct gene expression".

That said, you are correct however in querying what the definition of a gene expression neighborhood is since, outside tandemly duplicated genes that clearly share an evolutionary history, all definitions are arbitrary and therefore artificial.

ADD COMMENT
0
Entering edit mode

Nice reference. +1 for skepticism of received wisdom.

ADD REPLY
0
Entering edit mode
13.3 years ago
Paul • 0

@Casey I realise that you did not exclude the possibility of clustering for other reasons, such as to facilitate co-inheritance of co-adapted genes, but the strongly negative tone of your comment might lead one to conclude that gene clustering has no function. I understand your point of small amounts of noise in large datasets providing examples to support nearly any hypothesis. It is, however, worthwhile to maintain an open mind regarding gene clustering and possible functions. Co-expression, co-inheritance, gene duplication, coordinated epigenetic control, sequential gene expression, facilitated gene conversion may all occur even if there are specific to a limited number of gene clusters.

ADD COMMENT

Login before adding your answer.

Traffic: 2251 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6