Hello all, I am trying to resolve the phylogeny of a bacterial group. Trees infered using conventional approaches based on sequence data are conflicting. Therefore I'm think about using genome rearrangement data to build trees. I first used Mauve to align the genomes and identify the homologous genomic blocks. Then I export the data (permutation matrix) and tried to use other programs to build tree based on it. I'm not interested in rearrangement history, but just focus on species tree. So far I tried MGR, MGRA and BADGER. MGR runs, but veeeeeeerrrry slowly. The latter two programs just don't work for my case. Therefore I am here asking if anyone happens to know some better solutions. Thank you for your time reading this!
can you try coding the rearrangements into 1/0 characters, old school cladistics uses this for all kinds of morphological characters.
I don't know if there is a way to code it in binary data, or I should have already solved it using RAxML...
It's maybe a little off-base, but there was a paper at ISMB last month which did something similar, but using FISH copy number data, and reconstructing human cancer phylogenies. More importantly, they built quite a specialised and highly efficient piece of software to do just that. I wonder if your problem might not be similar enough that you could adapt their method?
http://bioinformatics.oxfordjournals.org/content/29/13/i189.full
Thanks for your suggestion! I browsed the program FISHtrees. Unfortunately it does not seem to be the type we are looking for. Genome rearrangement data is not alignable binary or multi-state data. It's something like:
It looks to me like that would convert quite well to a binary matrix, with columns for genome regions, rows for species (or individuals), and each entry containing {1,0} to denote presence or absence of that region?
I'm afraid that isn't the case. It is the order of genes that matters, instead of the presence / absence of genes in each genomic loci.
Here's a paper where they have coded gene order, presence/absence as a matrix for baculoviruses. It sounds like what you are looking for: http://www.ncbi.nlm.nih.gov/pubmed/11483757
Yes it is! Thanks for recommending this! I later found a couple of related articles, including the latest ones, such as: www.ncbi.nlm.nih.gov/pubmed/23424133