Question

Is Snp Data Obtained From Mauve Progressive Alignment Useful For Phylogeny Of Bacterial Whole Genomes?

0

Entering edit mode

12.6 years ago

Naren ★ 1.0k

I have SNP data of multiple genomes obtained by Mauve Progressive alignment.
How can I utilize this data for plotting Phylogenetic tree of those genomes?

Sample data: (For 3 genomes)

SNP     sequence_1    sequence_2    sequence_3  
ACA        4        4        4  
AAG        9        9        9  
CTC        10        10        10  
TGT        12        12        12  
GAG        15        15        15  
TCT        18        18        18
........................and so on till last position.

(Numbers are positions in whole genome)

snp phylogeny • 6.0k views

ADD COMMENT • link updated 12.6 years ago by aidan-budd 1.9k • written 12.6 years ago by Naren ★ 1.0k

score 1 · Answer 1 · 2013-02-07

1

Entering edit mode

12.6 years ago

aidan-budd 1.9k

I've no experience or knowledge about using SNP data in this format for estimating a phylogenetic tree.

My (strong!) preference when it comes to tree building is to use explicit probabilistic evolutionary-process based models - almost always focused, if working on nucleotide sequences, on the evolutionary process of base substitution.

There are lots and lots of software packages out there, that use such models, and which take as input a multiple sequence alignment of your sequences, rather than a list of SNPs.

Without, as I say, any experience working with a list such as you describe (maybe this is common practice in some contexts?!), I would rather recommend that you instead get your hands on data which can be transformed e.g. into a fasta format multiple sequence alignment file, which can be used (or adapted) for input to software such as PhyML, RAxML, MrBayes etc.

I notice, looking at the PLoS One progressiveMauve paper, that they say

"The alignment can also be used to extract variable sites for more traditional phylogenetic analyses. "

which suggest to me that you may be able to get something like this out of the aligner.

ADD COMMENT • link 12.6 years ago by aidan-budd 1.9k

0

Entering edit mode

Thank you for your response. I did a small mistake while asking. instead of saying "Phylogeny" I should have said "Clustering.". Just now I found a way to do clustering in Statistica Package.However, your suggestion about doing Phylogeny using Multiple alignment file (With little formatting) in PhyML or RAxML is a better option.

ADD REPLY • link 12.6 years ago by Naren ★ 1.0k

0

Entering edit mode

Thanks for the feedback, /\/ari - nice/motivating to have the feeling that my posts are indeed being read by the question posters :)

One further comment I'd make (to your comment) is that, meh, I wouldn't say that PhyML/RAxML are necessarily the best/better options - it rather (as always in bioinformatics - and I guess in life in general?!) depends on the reason, the specific outcomes you need to get, out of your analysis. If you're interested in making inferences about evolutionary processes from your data, then I'd recommend using tool(s) that incorporate explicit evolutionary models - but there may well be other applications/sets of questions you're interested in asking where alternative approaches may be more useful.

ADD REPLY • link 12.6 years ago by aidan-budd 1.9k