Hello.
I have aligned the 88 contigs of an E.coli de novo assembly against the closest reference genome, using WGvista. The aim is to identify structural differences in the alignment, in detail. The output of WGvista is a multiple fasta alignment (MFA) file. The format gives pairwise alignments between the reference (1 big sequence) and contigs of the assembly.
An example of one of the alignments in the files is:
>NC_007946.1 NC_007946:228496-228726 (+)
AGTTTAATTCTTTGAGCATCAAACTTTTAAATTGAAGAGTTTGATCATGGCTCAGATTGA
ACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAGCAGCTTGCTGCTT
TGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCCGATGGAGGGGGATAA
CTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGG
>FW NODE_5_length_400106_cov_36.4682:399876-400106 (+)
AGTTTAATTCTTTGAGCATCAAACTTTTAAATTGAAGAGTTTGATCATGGCTCAGATTGA
ACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAGCAGCTTGCTGCTT
TGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCCGATGGAGGGGGATAA
CTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGG
= score = 231 type = M2 L1 = 5065741 L2 = 400106 AL1 = 231 AL2 = 231 P_ID = 100
My problem is that I want to be able to view all of the alignments to the assembly laid out alongside the reference sequence. However, the MFA file treats each alignment as a separate comparison. This may be the wrong approach entirely, so any alternatives are welcome.
mauve
may be a better tool for this application? Have you tried it?Thanks. I have given Mauve a try, but I didn't get a close to what I wanted with WGvista.
PhyloVISTA appears to be able to use MFA files and should show what you want.
Funnily enough I had looked at it because it mentioned the use of MFA files. Unfortunately it assumes that every sequence in the file is a different species, and is a multiple alignment. Whereas my file has a many pair-wise alignments. I rerun MAUVE and the .alignment file is an almost identical format to the output of WGvista. Thanks for you interest!