Printable visualizations of large-scale alignments
2
1
Entering edit mode
6.0 years ago
Michael G ▴ 80

I would like to visualize a large ("large" as in 20 bacterial genomes) multi-sequence alignment such that (a) the sequences are wrapped within pages, (b) the individual nucleotide letters remain visible (at least minutely), and (c) nucleotide differences compared to a consensus are highlighted. I wish to convert these alignments into PDF documents for subsequent printing.

Popular alignment viewers such as AliView or Geneious have their difficulties with such alignment visualizations. AliView cannot provide a wrapped layout, whereas Geneious hangs for large alignments (24 GB RAM insufficient for alignment of 20 bacterial genomes).

Do you have any software suggestions?

Note: One option may be to use a console-based alignment viewer (such as alan or alv), if the resulting visualizations could be passed to cups-pdf.

Edit 1: The R-package msa has been among my earlier tryouts. While a wonderful tool for generating small, publication-ready visualizations, it too stalls once the alignment becomes too large. For example, an alignment of 20 bacterial genomes is not converted within two hours of executing command myFirstAlignment <- msa(mySequences). Given that msa essentially wraps texshade, I would guess that the processing time with texshade would not be much smaller either.

alignment sequence • 3.1k views
ADD COMMENT
0
Entering edit mode

I won't add this as an answer just yet since I'm not sure how well it would handle it either, but the other 2 options that occur to me are SeaView (which I think can save PDF or maybe PostScript), and ESPript.

I can't attest to how well either will deal with large data though.

Can I ask why it is you need such a large alignment 'printed'? It doesn't seem like it would be very useful.

ADD REPLY
0
Entering edit mode

When you say convert to PDF, do you want them to still be vector graphics? Or would bitmaps work too?

ADD REPLY
3
Entering edit mode
6.0 years ago
Michael G ▴ 80

I found that I can achieve what I was looking for using the command-line alignment viewer alv. It may not be super pretty, but it is fast:

alv myReallyLargeAlignment.fasta -t dna -k -w 300 | aha | wkhtmltopdf - soughtVisualization.pdf

In addition, the alignment viewer belvu of the SeqTools package can also generate visualizations quickly, albeit only the first 10kp or so are visualized (depending on the line wrap value setting).

ADD COMMENT
0
Entering edit mode

I get an error from this, Alv bug! Please report! and no usable output

ADD REPLY
0
Entering edit mode
6.0 years ago
thackl ★ 3.0k

If you want PDF, there's are powerful latex package: http://ftp.cvut.cz/tex-archive/macros/latex/contrib/texshade/texshade.pdf

And if you don't want to deal with tex, there's an R interface, too: https://rdrr.io/bioc/msa/man/msaPrettyPrint.html

ADD COMMENT
0
Entering edit mode

Beat me to it! TeXShade would be my suggestion. I used it extensively in my thesis. Just be aware that it can add some compile time to the document.

ADD REPLY
0
Entering edit mode

@jrj.healey Theoretically I would agree, but a compilation of a texshade section (with 150000+ bp) in a LaTeX document would take ages. I am looking for something more light-weight.

ADD REPLY
0
Entering edit mode

@thackl The R-Package msa has been among my earlier tryouts, but is not a workable solution. It stalls when given a multi-sequence alignment of 20 or more bacterial genomes. Likewise, tex-documents with texshade-sections would take ages to compile (if at all) under such input alignments. In my experience, the only software tools that can open bacterial-sized alignments for visualization in reasonable time frames seem to be terminal/console-based tools (such as the ones mentioned above).

ADD REPLY
0
Entering edit mode

Ah, good to know. Haven't used it for alignments that big yet.

ADD REPLY

Login before adding your answer.

Traffic: 1655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6