I have a folder with approx 3.000 fasta files. Each fasta FILE corresponds to one gene (orthogroup) and it contains multiple sequences (orthologues from multiple species).
I want to do multiple sequence alignments in ClustalO or MUSCLE for each of these files (genes). I have a script that works for SINGLE input fasta file and it creates one multiple sequence alignment.
Does anyone know how can I run these thousands of alignments in parallel or at least submit the job with one script? I really don't want to do it manually for over 3,000 alignment jobs.
Tnx!
Hi GenoMax and sorry for not clarifying my question- yes I am submitting the job(s) via a job scheduler (SGE), so HPC user. :) I might have used the term parallelization wrong, I meant it more in the sense of submitting one job (script) for multiple/separate MSA alignments to be made. I read about the `"for" loops in some previous posts, but I a not familiar with it, so I'll give it a try. Thank you!
Here is a very simple way of doing this. You will need to figure out how to enclose these commands for submission via
SGE
. The source files are namedSample_1.fa, Sample_2.fa
etc in this example.Note:
echo
is in the command to just print the commands out to screen and not execute them.basename
strips.fa
extension from each file name obtained from the loop, so you can use the resulting sample_name
to create new output file names like this${name}.aln.afa
.Thank you so much for making this example! It makes sense to me and I got the general idea.
What does it mean
ls -1
? I know whatls
stands for, but why-1
?I wrote this script and submitted the job from the same folder where the script and fasta files are (.fa extension). I changed the name of files since I have, for example, OG0008990.fa etc. and added -threads $NSLOTS .
It looks looks this:
But I got this error message: seems like it doesn't recognise
do
token.I'l try to figure this out for SGE and post it here when I get it right, but this is a very good start! Thank you!
I forgot to remove
echo
! Now it works perfectly fine! It will take some time but I already see in the stdout file that muscle is doing its job and I see my alignment files being created. Big thanks, you helped so much!