Is there a multiple sequence alignment format that is totally human readable? Clustalw comes close for me, but I wanted to add coordinates also.
Is there a multiple sequence alignment format that is totally human readable? Clustalw comes close for me, but I wanted to add coordinates also.
If by coordinates, you mean "genomic coordinates", then the MAF
format is the way to go.
a score=23262.0
s hg18.chr7 27578828 38 + 158545518 AAA-GGGAATGTTAACCAAATGA---ATTGTCTCTTACGGTG
s panTro1.chr6 28741140 38 + 161576975 AAA-GGGAATGTTAACCAAATGA---ATTGTCTCTTACGGTG
s baboon 116834 38 + 4622798 AAA-GGGAATGTTAACCAAATGA---GTTGTCTCTTATGGTG
s mm4.chr6 53215344 38 + 151104725 -AATGGGAATGTTAAGCAAACGA---ATTGTCTCTCAGTGTG
s rn3.chr4 81344243 40 + 187371129 -AA-GGGGATGCTAAGCCAATGAGTTGTTGTCTCTCAATGTG
A bunch or tools and libraries support such format, including Jim Kent's tools and bx-python.
Hey,
I think there are two questions here. One is related to the multiple sequence alignment formats itself, on this point I recommend, as people have done here, to use CLUSTAL or might be PHYLIP interleaved. The second question is about visualization, there are some nice tools such as jalview or seaview that allow you to visualize your alignment in a really pretty way interdependently of the MSA format. Moreover, these visualizers could give you some information about position conservation, consensus sequence, etc. I hope this makes sense to you.
as far as I know, clustal output has been around for years because of its usability. considering that each sequence line has 60 bases after 16 tag characters, it should be straightforward to calculate the position of each base. maybe is not "totally human readable", but a quick alignment parser could be of great help. I'm thinking about something like parsing such clustal output and generating, for instance, a html file that could have each base "tooltipped" with its position: you would then have the readability of a clustal output plus the tooltip benefits of basic html display.
as a format for the future alignment-tool developers, I would suggest a modified clustal output containing a header for each block, kind of the phylip format's header (number of sequences and sequence length), but also a value for the starting base of each alignment block. at least that would help positioning each base within each block.
ClustalW 2 and Clustal Omega can produce both the standard version of the Clustal alignment format and one with sequence coordinates included:
You can try showalign or prettyplot from EMBOSS, the first one is black/white, the second has colors.
EMBL-EBI provide an on-line service for MView (http://www.ebi.ac.uk/Tools/msa/mview) which can be useful for evaluation of the software and the various multiple alignment reformatting and visualisation options available in MView.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
The List of alignment visualization software provides pointers to a wide range of sequence alignment editors and visualisation tools.