Hey guys,
When I annotate genomes on prokka, using prokka --outdir mydir --prefix mygenome contigs.fa, I realized that the output .fna file only has this designation on fasta-header
>NODE_57_length_4609_cov_3.05627
ATTTTATTATGGTGATCCCCTGGGCGAAATGCGCCTGGTAAGCAGAGTTTTTGAAATGTA
AGGCCTTTGAATAAGACAAAAGGCTGCCTCATCGCTAACTTTGCAACAGTGCCCTTGATA
TCTAGTATGACGTCT...
How can I optimize the command, so that the strain name appears before NODE? Thanks!
Did you try the
--genus Escherichia --species coli --strain POO247
command-line arguments?The question is that I have several genomes to annotate, and it would be great to create a loop to annotate the .fna file with the corresponding strain name!
Put that command in a loop. This is just an outline.
Didn't work... I still get the >NODE... designation within the .fna file. No strain name on the fasta-headers
This was not code that was going to work. It was just an idea of how you could construct something that will allow you to use @h.mon's suggestion of using the correct command line options for
prokka
. Post some names you have and we can work on code that will get you closer to the solution.So, I'm working on several E. coli genomes. One of the fasta files has this name: Escherichia_coli_146411.fasta Within the file, the fasta-headers are annotated like this
It will create an output file Escherichia_coli_146411, with all relevant outputs (proteins and coding sequences) annotated accordingly to the strain name, but the outputted .fna file will maintain the exact fasta-headers as the input. Can you help me out?
Thanks, this should work! Could you please convert this into a loop? Considering I have the strain name on the fasta filename, I would have to create a loop to fetch the strain filename and output it on the fasta-header of each contig on the .fna file. Thanks!
There's no way to do this directly in the fasta header. You'll have to modify it after the fact. The
--locus_tag
argument governs the filenames however, so you can delineate your files based on that and modify afterwards using Genomax's suggestions (among others).