Question

prokka for several files at once

0

Entering edit mode

4.8 years ago

AbdelAbdel ▴ 30

I wanted to know how I can launch prokka for a folder that has several .fasta files of several genomes annotated thank you for the idea.

annotation • 5.4k views

ADD COMMENT • link updated 4.8 years ago by Mensur Dlakic ★ 28k • written 4.8 years ago by AbdelAbdel ▴ 30

0

Entering edit mode

What have you tried? How does prokka take a FASTA file as an input argument? Can multiple files be provided? If one file is expected, can process substitution be used? You should ask and exhaust these questions yourself.

ADD REPLY • link 4.8 years ago by Ram 44k

0

Entering edit mode

Now I'm intrigued about how you intend to use process substitution for this...

ADD REPLY • link 4.8 years ago by cschu181 ★ 2.8k

0

Entering edit mode

If only one file is expected with a parameter, say -f, you can use -f <(cat file1 file2 file3), and that is how process substitution can be of value here. The point of my comment was to get OP to think and invest some effort on how to solve their problem.

ADD REPLY • link 4.8 years ago by Ram 44k

0

Entering edit mode

Ok makes sense, but that would mix them, which may be reasonable or not, depending on what's in the files. Was just curious.

ADD REPLY • link 4.8 years ago by cschu181 ★ 2.8k

score 2 · Answer 1 · 2020-03-11

I'm assuming that you want to annotate each fasta file separately. If that's correct, then you should be able to do this relatively easily with gnu parallel.

According to the github page, the simplest prokka usage is just:

prokka <inpute fa file>

Therefore, if you want to run prokka on several input fasta files simultaneously, you could do this with gnu parallel. For example (assuming the fasta files are in your current working directory):

ls *.fasta | parallel --verbose "prokka {} --prefix {.}_out"

In the above command, each fasta file name is piped to parallel, which will launch a a separate prokka analysis for each of those fasta files. The output file names will be based on the input fasta file names, with the ".fasta" extension removed. The "--verbose" flag will print the prokka command for each input fasta file to the screen, which makes it easier to understand what exactly is going on.

Note that I have not tested the above command, so you might consider adding the "--dry-run" flag. This will print out the commands to be run without actually running them.

You can find many great gnu parallel examples here.

Of course, if you didn't actually want to annotate these genomes separately, then the above approach will not be what you want.

score 0 · Answer 2 · 2020-03-11

0

Entering edit mode

4.8 years ago

Mensur Dlakic ★ 28k

A similar topic was discussed couple of days ago - see here.

ADD COMMENT • link 4.8 years ago by Mensur Dlakic ★ 28k