Pairwise Genome Alignment Of Two ~30Mb Genomes To Give A Genetic Distance Estimate?
4
3
Entering edit mode
12.8 years ago
Ahdf-Lell-Kocks ★ 1.6k

I would like to have an estimate genetic distance between two Leishmania genomes, like X average nucleotide substitutions per site. I thought about doing a pairwise genome alignment of the two genome assemblies, then try to calculate the average nucleotide substitutions per site from the alignment. What tools should I look into for that?

pairwise genomics • 5.9k views
ADD COMMENT
6
Entering edit mode
12.7 years ago
Neilfws 49k

For genome alignment, it's hard to go past the MUMmer software suite. Chromosome alignment is very fast, typically a few seconds.

You might find the MUMmer SNP detection pipeline useful. Assuming you have want to align 2 chromosomes, in fasta files named chr1.fa and chr2.fa, start with nucmer:

nucmer --prefix=chr1_chr2 chr1.fa chr2.fa

Then run show-snps on the resulting delta file:

show-snps -Clr chr1_chr2.delta > chr1_chr2.snps

Sample output (first few lines):

    [P1]  [SUB]  [P2]      |   [BUFF]   [DIST]  |  [LEN R]  [LEN Q]  | [FRM]  [TAGS]
========================================================================================
      24   A G   797603    |       24       24  |  5315120  5248520  |  1  1  chr1     chr2
      53   . T   797633    |       29       53  |  5315120  5248520  |  1  1  chr1     chr2
     759   C T   798339    |       45      759  |  5315120  5248520  |  1  1  chr1     chr2

You can then use SNP totals to measure genetic distance. One approach to this is described in:

Completion of the Genome Sequence of Brucella abortus and Comparison to the Highly Similar Genomes of Brucella melitensis and Brucella suis

which states in the methods section:

SNP totals were used as a measure of genetic distance for the neighbor-joining tree [Ps = (ΣSNP count/1,000)] construction with MEGA2.

ADD COMMENT
3
Entering edit mode
12.7 years ago
ALchEmiXt ★ 1.9k

For large complete genome alignments we use the MUMmer suite as suggested but fine-tuned with the scripts as described in this MUMi paper. It calculates sense and anti-sense non-overlapping Maximal Unique Matches and turns this into a matching percentage by division of the average genome size.

We have used this quite succesfully to align larger genomes and bacterial as well as viral species reproducing CGH array experiments and many other studies. Using pair-wise comparisons we calculate complete phylogenetic trees (of N>20 bacterial species) within 10 minutes.

The paper also provides the parsing scripts.

ADD COMMENT
0
Entering edit mode

PS: the only requirement for this to work is having a running version of MUMmer...the provided script calculates the rest...:-)

ADD REPLY
2
Entering edit mode
12.8 years ago

Here is what I have done to get a distance matrix.

For all variable loci code each individual as a 0 or 1 (has a non reference base at that position). Then simply count the differences between the two binary strings.

If you are familiar with perl you can use Bit::Vector to accomplish this. It is pretty effective. I have done all pairwise distances between 40 1.3 gig genomes in a few hours.

ADD COMMENT
0
Entering edit mode

Thanks. Also how shall I go about creating the pairwise alignment then?

ADD REPLY
2
Entering edit mode
12.7 years ago
  • BLAST
  • LASTZ
  • MUMmer
  • CHAOS
  • GRIMM-synteny
  • DRIMM-synteny
  • Mercator
  • Enredo
  • OSfinder
  • SuperMap
  • progressiveMauve
  • MUGSY
  • MAVID
  • LAGAN/Multi-LAGAN
  • DIALIGN
  • SeqAn::T-Coffee
  • FSA
  • Pecan
  • NUCmer/PROmer
  • MULTIZ/TBA
  • AXTCHAIN/CHAINNET
ADD COMMENT

Login before adding your answer.

Traffic: 2487 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6