I am writing a Bash script that has the following aims:
- Compare multiple BAM files (over 100) using samtools to obtain the number of mapped reads
- Find the BAM file with the smallest number of reads
- Based on this smallest number, use seqtk to scale all other BAM files
My problem is to do with how to parse through all BAM files. I am very new to using getopts and I can't get it to do what I want: parse through every BAM file.
Here is my script so far:
#!/usr/bin/bash
# Definine the usage and ensure it shows up on screen if no arguments are given
USAGE() { echo "Usage: bash $0 [-b <in-bam-files-dir>] [-o <out-dir>] [-c <chromlen>]" 1>&2; exit 1; }
if (($# == 0)); then
USAGE
fi
# Use getopts to accept each argument
while getopts ":b:o:c:h" opt
do
case $opt in
b ) BAMFILES=$OPTARG
;;
o ) OUTDIR=$OPTARG
;;
c ) CHROMLEN=$OPTARG
;;
h ) USAGE
;;
\? ) echo "Invalid option: -$OPTARG exiting" >&2
exit
;;
: ) echo "Option -$OPTARG requires an argument" >&2
exit
;;
esac
done
# Start parsing through BAM files - here is where I get stuck
for i in $@
do
echo $i
done
Obviously, my for loop will not just have echo in. That would be where I would sort the BAM file, count reads etc. But for now, i am jus trying to make sure I am parsing the BAM files properly.
My problem is: how do I get getopts to understand that the BAMFILES is a directory containing multiple BAM files. For the moment, when my script reaches the for loop and write out the command echo, I get the following output:
$ bash script.bash -b /path/to/files/*bam -o ../output/dir -c option
# Output:
-b
/path/to/files/file1.bam
/path/to/files/file2.bam
/path/to/files/file1.bam
-o
../output/dir
-c
option
I guess the output I want is just:
/path/to/files/file1.bam
/path/to/files/file2.bam
/path/to/files/file1.bam
Because I could then use a for loop to say for each bam file, do this etc. Now I know that using $@ is not the right thing in the for loop, but using $1 just prints -b, not the bam file names. I just can't seem to access the BAM file names I want and nothing else.
Thanks!
No offense, but this question might be better placed on stackoverflow, I'm sure you'll find more replies there.
Bash's getopts is limited and confusing, especially with larger option lists. In case you want to have more complex functionality with easier to read code and without the pitfalls of bash syntax, you might want to look into Python, specifically Python's argparse. Nothing is free, however - naturally, you'd have to get more familiar with Python and the language has its own pitfalls, too.