Hi,
When using DNA as input data for OMA, how do alignment scores compare to aminoacid alingments? I am wondering specifically if the default MinScore threshold applies equally well for this kind of data, or if one should consider changing it.
Hi,
When using DNA as input data for OMA, how do alignment scores compare to aminoacid alingments? I am wondering specifically if the default MinScore threshold applies equally well for this kind of data, or if one should consider changing it.
Hi,
when you use OMA standalone with DNA sequences, the program uses an empirical scoring matrix based on DNA sequences. The Smith-Waterman dynamic programming algorithm will compute then the optimal local alignment using a scoring matrix where 50% of the sites are expected to have undergone a mutation. The score of this local alignment can be seen as the sum of 10*log10(prob(positions are homologous)/prob(unrelated positions)), so the score is not length independent. The minscore
is simply the threshold, which alignments should be considered to identify sequence pairs which originate from a common ancestor (i.e. are homologous). The minscore value can thus depend on the type of DNA sequences you are looking at, e.g. full genes, vs coding sequences, etc. So playing around with the minscore setting might be useful, but the default value should also not be completely off.
OMA standalone has not been applied to DNA datasets at large scales and benchmarked extensively. We recommend analysing DNA datasets with OMA standalone only to smaller datasets with relatively little divergence. In other cases, we recommend using the protein sequences, which carry evolutionary signals for much longer evolutionary distances.
Cheers Adrian
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.