Hi.
I am trying to get any useful information from 16S rRNA bacterial genes multiple sequence alignment (MSA), i.e. for sake of simplicity, average length of conservative and non-conservative regions. I think it may be absolutely incorrect to build MSA using, for example, Clustal, MAFFT or T-Coffee to further look at length of those regions and make some assumptions. Could you please tell me about some approach one should try?
To be more clear, I am trying to extract some info from these MSAs to use it during assembly of 16S rRNA from the NGS reads (potentially, even from metagenomic reads).
I am also looking for some conservation estimates of 16S rRNA genes and their dependence on evolutionary distance in relation to different species. For example, if we consider 16S rRNA genes MSA from only one genus G then that estimate will have different value than if we deal with the 16S rRNA genes from G and some other genus. What potential ways exist to explore this topic?
Thank you.
I kinda feel the same way -- i think that knowing the (estimated) length of the conserved regions vs. the hypervariable regions should result, for example, in a more straightforward and faster way to look at alignments between reads and the gene sequences.