Entering edit mode
8.0 years ago
enkh.tug
•
0
Hello!
I need to run python script on whole proteome, which is ~3600 proteins. To run it I require two seperate files: fasta sequence and profiles file. They both same file name, based on their headers. How do I run it on two folders (one with fasta sequences, and other with profile files)? Do I have to modify script, or write a new one?
Search for "bash loop files" and you'll find many hints, e.g., this one.
This should be reasonably straightforward to solve. Can you show the naming pattern of the fasta sequences and profile files? How do you know those files match to run the tool? Did you write the script yourself?
Script is not written by me. Fasta files are good, and profile files was generated by script creator. Naming pattern is: C_PROKKA_00001 - C_PROKKA_03211 for genome and pGX1_PROKKA for plasmids.
So if I understood correctly you have 3211 files with name C_PROKKA_00001 up to C_PROKKA_03211. And what about the plasmids? I don't see you mentioned those before. Is it just one file? Every C_PROKKA file has to be ran together with the pGX1_PROKKA?
I have 3151 chromosomal genome sequences and 521 plasmid sequences. I had multifasta file and I splitted it based on headers. Each fasta sequence has corresponding profile file, and they has to be run together. Besides, I need to concentrate output file in one text file, but if I set output file and run second sequence it overwrittes it. Thank you for Your interest in my problem.
I am not sure how you are using your command line but generally
>>
redirector will append data to an existing file.Can you post an example of the names of
It should be fairly easy to do this, for example see the answer of ole.tange for a gnu-parallel solution.
To set clear expectations: If you are doing this on a single computer bash loop will still need to be serial. Depending on how heavy the computation is in this case (and how many core you have available on this machine) you could look into using GNU Parallel to expedite this process to some extent.