Question

Help need to use information in an multiple fasta alignment (MFA) file?

0

Entering edit mode

5.3 years ago

Ian 6.1k

Hello.

I have aligned the 88 contigs of an E.coli de novo assembly against the closest reference genome, using WGvista. The aim is to identify structural differences in the alignment, in detail. The output of WGvista is a multiple fasta alignment (MFA) file. The format gives pairwise alignments between the reference (1 big sequence) and contigs of the assembly.

An example of one of the alignments in the files is:

>NC_007946.1 NC_007946:228496-228726 (+)
AGTTTAATTCTTTGAGCATCAAACTTTTAAATTGAAGAGTTTGATCATGGCTCAGATTGA
ACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAGCAGCTTGCTGCTT
TGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCCGATGGAGGGGGATAA
CTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGG
>FW NODE_5_length_400106_cov_36.4682:399876-400106 (+)
AGTTTAATTCTTTGAGCATCAAACTTTTAAATTGAAGAGTTTGATCATGGCTCAGATTGA
ACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAGCAGCTTGCTGCTT
TGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCCGATGGAGGGGGATAA
CTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGG
= score = 231  type = M2  L1 = 5065741  L2 = 400106  AL1 = 231  AL2 = 231  P_ID = 100

My problem is that I want to be able to view all of the alignments to the assembly laid out alongside the reference sequence. However, the MFA file treats each alignment as a separate comparison. This may be the wrong approach entirely, so any alternatives are welcome.

alignment fasta • 1.7k views

ADD COMMENT • link updated 5.3 years ago by GenoMax 147k • written 5.3 years ago by Ian 6.1k

1

Entering edit mode

mauve may be a better tool for this application? Have you tried it?

ADD REPLY • link 5.3 years ago by GenoMax 147k

0

Entering edit mode

Thanks. I have given Mauve a try, but I didn't get a close to what I wanted with WGvista.

ADD REPLY • link 5.3 years ago by Ian 6.1k

0

Entering edit mode

PhyloVISTA appears to be able to use MFA files and should show what you want.

ADD REPLY • link 5.3 years ago by GenoMax 147k

0

Entering edit mode

Funnily enough I had looked at it because it mentioned the use of MFA files. Unfortunately it assumes that every sequence in the file is a different species, and is a multiple alignment. Whereas my file has a many pair-wise alignments. I rerun MAUVE and the .alignment file is an almost identical format to the output of WGvista. Thanks for you interest!

ADD REPLY • link 5.3 years ago by Ian 6.1k