Phylogenetic Distance From 16S Rrna Using Ssu-Align
2
2
Entering edit mode
13.3 years ago
Alf ▴ 490

My problem is very simple: I need a software which is able to give a phylogenetic distance matrix for some species for which I have the 16S rRNA sequences collected. Initially, I cannot assume that the species appear in a certain database, so I cannot use precomputed alignments.

Initially I wanted to use ClustalW for the alignment, but I read about rRNA secondary structure, and I found a paper from the SILVA team which clearly specifies "don't use software like Clustal, Mucle or similar", arguing that those aligners were only relying on the sequence data and not in the secondary structure of this rRNA (and therefore they don't perform very well). I found a very nice tool, which has been discussed in this page called ssu-align (which uses that secondary strcture). It is made by the people at the Eddy lab (those who made HMMer), and it is pretty fast. After using ssu-align (and masking the alignment), three alignment files are produced, one for (possibly) bacterial sequences, archaeal sequences and other for eukaryot sequences (in my dataset I have sequences from all the three domains). My problem here is the combination of the three alignments to be able to compute the phylogenetic distance among all the species:

  • Does it makes sense to combine them in a single alignment? If it does, what would be the proper way to do this? In the documentation of ssu-align seems that the program ssu-merge was designed to do this, but I obtain an error.
  • If the tool I've choosen is not suitable for this task, would you please recommend another option (preferably a free software tool, and not a web server).
  • I assumed that you need a single multiple alignment file to infer phylogenetic distance, but maybe this is not true. Is it possible to get the distance matrix having three alignment files instead of just one?
  • Obviously, if someone can propose another methodology or comment for my pipeline starting with just rRNA sequences, I would really appreciate it.

Note that I don't need to have a perfect distance estimation, I just need a rough estimation which, in a way, resembles the reality (but which could be accepted by a reviewer :) ).

phylogenetics distance • 5.2k views
ADD COMMENT
0
Entering edit mode

Can you be a bit more specific about the error you btained with ssu-merge?

ADD REPLY
2
Entering edit mode
13.3 years ago
Alf ▴ 490

Sorry for the self-answer. I received a very detailed email from the author (Eric P. Nawrocki) of the program, which I will summarize here to provide help:

There are basically two solutions:

  1. Basically, to align all of the sequences to a single model (for instance bacteria) and use a strict mask to remove positions that are not homologous between all three domains. To do this, the option '-n bacteria' in 'ssu-align' can be used. This will force the program to only use the bacterial model. After that, a strict mask (with for example 'ssu-mask --pf 0.9999 --pt 0.9999') can be applied.
  2. Use a differente precomputed model than the three predefined. The author suggest Rfam 10.0 RF00177 CM, using the option '-m RF00177.cm' in ssu-align. After, a strict mask (as in option 1) can be applied.

About ssu-merge, the author says:

Regarding 'ssu-merge', as you've no doubt figured out for yourself, it wasn't meant for combining alignments from different models, but rather only combining alignments to the same model. This is potentially useful if you are aligning many thousands of sequences and split up the job onto many cpus, all resulting alignments can be merged into one using ssu-merge.ssu-merge.

ADD COMMENT
1
Entering edit mode
13.3 years ago
Joseph Hughes ★ 3.0k

If I understand correctly, you want to determine the distances between all your 16S rRNA sequences whether they are archaeal, bacterial or eukaryotic, i.e. you don't just want domain specific distances. If that is the case, you will need to ssu-merge your alignments before doing the masking with ssu-mask. I imagine that combining the three domains in this way, will result in a large number of columns being removed from the analysis but you should reliably be able to calculate distances from this final combined and masked alignment.

ADD COMMENT
0
Entering edit mode

Nope. ssu-merge is to merge several alignments from the same domain. It's impossible (by the moment) to merge alignments from, let's say, bacteria and eukaryots, although one can get the impression it is possible by reading the documentation (this has been confirmed to me by the author of the program). One solution suggested by him is to force to align all sequences to the same model (bacteria, for example), and then only one alignment.

ADD REPLY

Login before adding your answer.

Traffic: 1517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6