I am interested in conducting whole genome phylogenetic analysis of whole genomes of closely related strains of bifidobacteria, with the hope of inferring a progenitor strain from the set. I am looking (possibly) for a program that will allow for the selection of genes that are conserved among all the strains and then construct a tree (if that is not the correct method, please let me know as well).
I am not sure if using a whole genome alignment is the best solution in this case. Usually the genomes of prokaryotes are usually sequenced with shotgun sequencing techniques, and they contain many repetitive regions and missing regions. Genome Alignment tools are usually designed to overcome these aspects, and not to detect conservation. So, it may be better to extract the sequence of the genes and align each gene separately, and then rank them based on dN/dS or another measure of conservation. This is just my opinion, I never worked on this type of systems personally.
It's a very old post but I thought I could add to it to help others who might want to do a similar analysis i.e. create phylogenies from whole genomes for prokaryotic species. I have created a basic analysis pipeline that tries to simplify the process of creating phylogenetic trees at species level using only the conserved (otherwise known as the core) genomic content of all the 'bacterial' species.
Hal does it, although it's already quite old and possibly painful to get installed and working. I made a Bash script that users HMMER to extract ribosomal proteins from proteomes and then constructs a concatenated alignment from them with Muscle and GBlocks. Then I use RAxML and PhyloBayes for treebuilding. You could do something similar?
Can I use similar approach in nucleotide(whole genome sequence). If I align few genome using mauve after that do I need to extract the conserved region or else tree generated by mauve is enough to show phylogenetic relationship between all genome?
ADD REPLY
• link
updated 2.7 years ago by
Ram
44k
•
written 9.9 years ago by
HG
★
1.2k
0
Entering edit mode
Unless you're dealing with > 99% similar genomes, you should really do your analysis on the protein level.
I had tried but bug report as usual other tool.