Hi all,
I have a bash script that I can call and I supply command line arguments to feed specific files into it. It works fine, but I want to optimize it but looping over a new directory where each file is set as a new argument. For example my script below:
#!/bin/bash
REF=$1
NAME=$2
if [ -f $REF.bwt ]
then
echo "$REF already indexed, skipping!"
else
echo "Indexing $REF"
bwa index $REF
#Use the indexed REF as input
for files in ../cleaned_files/*.txt
do
if [ -f ${files%%.txt}_$NAME.sam ]
then
echo "$files already cleaned, skipping!"
else
echo "Running assembly for $files"
bwa mem -B 2 -t 40 $REF ${files} > ${files%%.txt}_$NAME.sam
fi
done
This script continues for a while with other steps however, I have a directory with 100s of new reference files that I want to call in my script. Normally in terminal I execute my script as:
$ ./my_script.sh ref_1a.fa 1a_done
Where ref_1a.fa is a file I would input, and 1a_done is just a naming variable I use. My script can already be executed for each raw file in my cleaned_files directory, but I am unsure how to properly incorporate another for loop for my $REF files without screwing up my script (that took me forever to get because I am NOT great at scripting). I was thinking:
#!/bin/bash
REF=$1
NAME=$2
for ref_files in ../reference_files/*.fa
do
if [ -f $REF.bwt ]
then
echo "$REF already indexed, skipping!"
else
echo "Indexing $REF"
bwa index $REF
#Use the indexed REF as input
for files in ../cleaned_files/*.txt
do
if [ -f ${files%%.txt}_$NAME.sam ]
then
echo "$files already cleaned, skipping!"
else
echo "Running assembly for $files"
bwa mem -B 2 -t 40 $REF ${files} > ${files%%.txt}_$NAME.sam
fi
done
With this second script, would it use the first indexed reference file from reference_files and then use it for all the raw data files in cleaned_files, or is it going to try indexing ALL my reference files first? I need to finish my_script.sh per reference file, and then repeat this whole script again for the second reference file. If anyone can take a look at it and just quickly tell me if my process is right before I load it up on the server (because it doesn't execute right away due to queues hence why I am hesitant at loading different versions of my script and loading up the queue)? Thank you all so much :)
you are re-inventing the wheel. Use a workflow manager like nextflow https://www.nextflow.io/