I have several Genbank files that I would like to align using Mauve, and then export the ortholog alignments to a file. It is this file that will be analyzed with an in-house script. This script is expecting the header format as follows [>fileNumber:start-stop:Name
]:
>0:1483-2550:Campy1147c_20 +
TTATATCACATTGCTGAAAA........
No problem for genbank files when the sequence is in the file. However, when I use a fasta file, I will get a header like this:
>7
TTATATCACATTGCTGAAAA........
Which is also the format I will see when the genome in question does not have a particular ortholog.
>7
--------------------------------------------------------------------------------
----------------------------------------------------------------------...
My problem is that some of the genbank files (http://www.ncbi.nlm.nih.gov/nuccore/CP006702) for some reason do not have any sequences (translated amino/DNA) in the file. Their inclusion into Mauve will throw an error (after all, there is no sequence to align).
There is a fasta file that I can snag...however, with my limited Mauve experience, as mentioned previously, when I export the orthologs (post alignment), the headers will not include any information (other than >[1-9]*
).
As I see it, I can re-write my code (mild pain) or figure out one of these two items...
- In Mauve, is there a way to force the headers into the exported ortholog file when using a fasta file (with the file name or the GI from the fasta file)?
- Is there a way to get the sequence from the fasta file into the corresponding genbank file?