Question

Pairwise Genome Alignment Of Two ~30Mb Genomes To Give A Genetic Distance Estimate?

3

Entering edit mode

12.7 years ago

Ahdf-Lell-Kocks ★ 1.6k

I would like to have an estimate genetic distance between two Leishmania genomes, like X average nucleotide substitutions per site. I thought about doing a pairwise genome alignment of the two genome assemblies, then try to calculate the average nucleotide substitutions per site from the alignment. What tools should I look into for that?

pairwise genomics • 5.9k views

ADD COMMENT • link updated 12.7 years ago by 2184687-1231-83- ★ 5.1k • written 12.7 years ago by Ahdf-Lell-Kocks ★ 1.6k

score 6 · Answer 1 · 2012-03-14

For genome alignment, it's hard to go past the MUMmer software suite. Chromosome alignment is very fast, typically a few seconds.

You might find the MUMmer SNP detection pipeline useful. Assuming you have want to align 2 chromosomes, in fasta files named chr1.fa and chr2.fa, start with nucmer:

nucmer --prefix=chr1_chr2 chr1.fa chr2.fa

Then run show-snps on the resulting delta file:

show-snps -Clr chr1_chr2.delta > chr1_chr2.snps

Sample output (first few lines):

    [P1]  [SUB]  [P2]      |   [BUFF]   [DIST]  |  [LEN R]  [LEN Q]  | [FRM]  [TAGS]
========================================================================================
      24   A G   797603    |       24       24  |  5315120  5248520  |  1  1  chr1     chr2
      53   . T   797633    |       29       53  |  5315120  5248520  |  1  1  chr1     chr2
     759   C T   798339    |       45      759  |  5315120  5248520  |  1  1  chr1     chr2

You can then use SNP totals to measure genetic distance. One approach to this is described in:

Completion of the Genome Sequence of Brucella abortus and Comparison to the Highly Similar Genomes of Brucella melitensis and Brucella suis

which states in the methods section:

SNP totals were used as a measure of genetic distance for the neighbor-joining tree [Ps = (ΣSNP count/1,000)] construction with MEGA2.

score 3 · Answer 2 · 2012-03-14

3

Entering edit mode

12.7 years ago

ALchEmiXt ★ 1.9k

For large complete genome alignments we use the MUMmer suite as suggested but fine-tuned with the scripts as described in this MUMi paper. It calculates sense and anti-sense non-overlapping Maximal Unique Matches and turns this into a matching percentage by division of the average genome size.

We have used this quite succesfully to align larger genomes and bacterial as well as viral species reproducing CGH array experiments and many other studies. Using pair-wise comparisons we calculate complete phylogenetic trees (of N>20 bacterial species) within 10 minutes.

The paper also provides the parsing scripts.

ADD COMMENT • link 12.7 years ago by ALchEmiXt ★ 1.9k

0

Entering edit mode

PS: the only requirement for this to work is having a running version of MUMmer...the provided script calculates the rest...:-)

ADD REPLY • link 12.7 years ago by ALchEmiXt ★ 1.9k

score 2 · Answer 3 · 2012-02-28

2

Entering edit mode

12.7 years ago

Zev.Kronenberg 12k

Here is what I have done to get a distance matrix.

For all variable loci code each individual as a 0 or 1 (has a non reference base at that position). Then simply count the differences between the two binary strings.

If you are familiar with perl you can use Bit::Vector to accomplish this. It is pretty effective. I have done all pairwise distances between 40 1.3 gig genomes in a few hours.

ADD COMMENT • link 12.7 years ago by Zev.Kronenberg 12k

0

Entering edit mode

Thanks. Also how shall I go about creating the pairwise alignment then?

ADD REPLY • link 12.7 years ago by Ahdf-Lell-Kocks ★ 1.6k

score 2 · Answer 4 · 2012-03-15

2

Entering edit mode

12.7 years ago

2184687-1231-83- ★ 5.1k

BLAST
LASTZ
MUMmer
CHAOS
GRIMM-synteny
DRIMM-synteny
Mercator
Enredo
OSfinder
SuperMap
progressiveMauve
MUGSY
MAVID
LAGAN/Multi-LAGAN
DIALIGN
SeqAn::T-Coffee
FSA
Pecan
NUCmer/PROmer
MULTIZ/TBA
AXTCHAIN/CHAINNET

ADD COMMENT • link 12.7 years ago by 2184687-1231-83- ★ 5.1k