I'm trying to run samtools markdup on all 38 of my .sorted.bam files in a folder on a shared cluster, but I'm attempting my first pipe job script to do this because it's so many steps. I tried the following job script, but it didn't seem to work the way I thought it would. Can anyone point out the errors? I don't need the intermediate files from the fixmate and sorting, just the final file that has gone through markdup, been name sorted, and has selected only paired mapped reads, as well as the markdup stats file.
for sorted in *.sorted.bam
do
stats=${sorted/.sorted.bam/.markdup.stats}
pairfilter=${sorted/.sorted.bam/.markdup.sorted.pairs.bam}
samtools fixmate -m $sorted - | \
samtools sort -@ 2 -o - - |\
samtools markdup -r -s -f $stats - - |\
samtools sort -@ 2 -n -o - - |\
samtools view -bf 1 - > $pairfilter
done
Thank you!
Edit: for reference, the error file from my job submission as is reports this:
Failed to open file "Gerson-11_paired_pec.markdup.stats" : No such file or directory
samtools markdup: failed to open "Gerson-11_paired_pec.markdup.stats" for input: No such file or directory
samtools sort: failed to read header from "-"
[main_samview] fail to read the header from "-".
Most likely
samtools markdup -r -s -f $stats - -
require a pre existing$stats
. May be add atouch $stats
to create the file before all the samtools commands. I am not 100% sure but will be curious to know if this is in fact the case.Thanks for the reply! That's a good point, but it didn't seem to be flagging it. I'm wondering if something was wrong with the error reporting because it gave me a different error output when I re-ran the shell script after removing it and re-uploading it to the cluster. Then it showed that the error was a lack of prefix in the sort function.