Thank you for reading my question. I am attempting to create multiple MUSCLE alignments for thousands of fasta in a directory.
The input would be: vbro1.fasta, vbro2.fasta... vbro6405.fasta; where a given file commonly contains 20 or more proteins.
The output would be: vbro1.afa, vbro2.afa... vbro6405.afa
My first thought was a for loop in bash like this. My second thought was to run this in biopython's MUSCLE wrapper. Either way, I'm concerned that the job is too big and that I'll need to write a script that will interate through the multifasta in more manageable pieces.
for i in outdir
; do
echo "muscle -in file${i} -out ${i}.afa -maxiters 1 -diags1 -sv"
done
I would greatly appreciate some assistance if someone could point me in the right direction.
Nice! I like to see bash solutions. It is possible to avoid the creation of the command file: find . -name "vbro.fasta" -type f -print0 | xargs -0 --max-procs=4 -L1 -I FILE muscle -in FILE -out FILE.afa -maxiters 1 -diags1 -sv Caveat: output file names are in .fasta.afa
Also and excellent solution that works very well! And thank you for showing me how to run the jobs in parallel.
I must say I learned new stuff about bash thanks to you.