Question

Mauve Headers / Add DNA sequence to genbank file

0

Entering edit mode

9.5 years ago

Cricket ▴ 10

I have several Genbank files that I would like to align using Mauve, and then export the ortholog alignments to a file. It is this file that will be analyzed with an in-house script. This script is expecting the header format as follows [>fileNumber:start-stop:Name]:

>0:1483-2550:Campy1147c_20 +
TTATATCACATTGCTGAAAA........

No problem for genbank files when the sequence is in the file. However, when I use a fasta file, I will get a header like this:

>7
TTATATCACATTGCTGAAAA........

Which is also the format I will see when the genome in question does not have a particular ortholog.

>7
--------------------------------------------------------------------------------
----------------------------------------------------------------------...

My problem is that some of the genbank files (http://www.ncbi.nlm.nih.gov/nuccore/CP006702) for some reason do not have any sequences (translated amino/DNA) in the file. Their inclusion into Mauve will throw an error (after all, there is no sequence to align).

There is a fasta file that I can snag...however, with my limited Mauve experience, as mentioned previously, when I export the orthologs (post alignment), the headers will not include any information (other than >[1-9]*).

As I see it, I can re-write my code (mild pain) or figure out one of these two items...

In Mauve, is there a way to force the headers into the exported ortholog file when using a fasta file (with the file name or the GI from the fasta file)?
Is there a way to get the sequence from the fasta file into the corresponding genbank file?

genbank alignment fasta • 2.0k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.5 years ago by Cricket ▴ 10