Pairwise Genome Alignment
6
6
Entering edit mode
14.1 years ago
Andrea_Bio ★ 2.8k

Hi

Sorry for yet another basic question, but what exactly is a pairwise genome alignment between 2 organisms e.g. human/chicken.

I remember the difference between algorithms for global/local alignments but I don't remember using genome alignments. I looked for a definition online and in several books.

Is it just a file of all of the areas that align between 2 genomes? Are they publicly available or do you have to prepare them yourself? Are they 'redone' between different genome assemblies?

What would be the benefit of aligning the whole genome in this way (if that is indeed the correct interpretation) rather than creating alignments for your areas of interest

thanks in advance

alignment genome pairwise • 16k views
ADD COMMENT
0
Entering edit mode

Consider one thing: the human genome has been generated with BAC clones sequencing, while the chicken genome is probably done by shotgun sequencing. This means that in the chicken genome, all the duplicated regions will be clustered together, and there it will be more holes in the most repetitive regions. So a genome-vs-genome alignment can lead to some artifacts due to the fact that the two genomes have been sequenced with different techniques.

ADD REPLY
15
Entering edit mode
14.1 years ago

Two important applications of genome alignments are:

  • Find conserved non-coding sequence (CNS), these are shared regulatory elements that have important functions. See VISTA's enhancer database, and pay attention to how they use the genome alignments to extract candidates.
  • Find conserved synteny and genome rearrangements. Shared synteny (or disruption, called breakpoints) can be used as phylogenetic signals to sort out species relationship.

It is not trivial to build genome alignments accurately, and often require two core steps - generating anchors and chain anchors to form large unambiguous synteny blocks. For example, in BLASTZ/CHAIN/NET pipeline, BLASTZ generates anchors, CHAIN/NET groups them; in LAGAN/SUPERMAP pipeline, LAGAN generates anchors, SUPERMAP groups them.

If you work with vertebrates, there is no need to repeat the exercise yourself. UCSC genome browser offer downloads for pre-built alignments, in MAF format.

There are also quite a few graphical tool, for example MAUVE (often used in prokaryotes). There is also a web-based system called CoGe - they have 10,000+ genomes updated weekly - so you can just pick two genomes and align using their SynMap pipeline, which also creates genomic dot plots for you. It takes some learning, but definitely worth it.

To your last point, why this is better than some local alignments. Well, they are the same animal, if you know which sequences to align. So think of genome alignments as BLAST (find similar sequences) + CLUSTALW (align). Most pipeline also has built-in rules to make sure you are more likely to find orthologous sequences.

ADD COMMENT
0
Entering edit mode

great answer, many thanks

ADD REPLY
9
Entering edit mode
14.1 years ago
Neilfws 49k
  1. Yes, a pairwise genome alignment is essentially a file of two aligned genomes.
  2. Some are available, e.g. VISTA genome alignments. There are also plenty of software tools available to do it yourself. A popular tool is MUMmer; another is LAGAN.
  3. Are they re-done as genome assemblies are revised? That would depend on whoever maintains the data. Hopefully, and in the best cases, yes they are.
  4. The benefit is that you can visualize large "blocks" of genome structure. These may include synteny (see this resource on yeast genome synteny) or large-scale rearrangements: duplication, deletion, inversion.

Genome alignment obviously makes most sense for organisms that are more closely-related. Human-chicken might be interesting, Human-Nanoarchaeum is not.

Genome alignment algorithms are often described as glocal; that is, they try to maximize local alignment whilst trying to include the start/end of one of the pairs. There is quite a good Wikipedia page on sequence alignment, if you need a simple guide or clarification.

ADD COMMENT
0
Entering edit mode

thank you for a thorough answer

ADD REPLY
0
Entering edit mode

Do you take one organism as the 'reference' and then align the other organism to it because naturally there won't be a one-to-one correspondence between chromosomes in the 2 organisms

ADD REPLY
0
Entering edit mode

You'd align pairs of chromosomes.

ADD REPLY
6
Entering edit mode
ADD COMMENT
0
Entering edit mode

LASTZ is new to me. Thanks for this one.

ADD REPLY
2
Entering edit mode
ADD COMMENT
1
Entering edit mode
14.1 years ago

Allow me to extend Haibao Tang's response. An alignment of the two genomes will give the blocks of synteny, of conserved gene order. Knowing that number can give an idea of how distant the genomes are - not necessarily in years but in terms of genome rearrangement and overall organization. Furthermore, one can look within a block and see what rearrangments took place afterward (after the two organisms diverged from a common ancestor). Was their an expansion or contraction of this or that gene family since that time? That's a pretty fundamental question in terms of evolution and divergence.

For many of these comparisons, the repeats in the genome fall out and do not enter the alignment. This is because many repeats are rather species-specific - human Alu elements do not align well to mouse B elements, e.g. A comparison of human and chimpanzee genomes will show where, maybe when Alu expansion occurred as Alus are more conserved here.

ADD COMMENT
1
Entering edit mode
14.1 years ago

If you look for reliable alignments of prokaryotic genomes there's quite old but nevertheless very useful ATGC (Alignable Tight Genomic Clusters) database. It is optimized for research on microevolution, as it removes much of the variability (rearrangements, recombinations, etc.) from the alignments.

ADD COMMENT

Login before adding your answer.

Traffic: 1254 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6