Question

Samtools pipe in job script to modify BAM files

0

Entering edit mode

2.6 years ago

Cory ▴ 10

I'm trying to run samtools markdup on all 38 of my .sorted.bam files in a folder on a shared cluster, but I'm attempting my first pipe job script to do this because it's so many steps. I tried the following job script, but it didn't seem to work the way I thought it would. Can anyone point out the errors? I don't need the intermediate files from the fixmate and sorting, just the final file that has gone through markdup, been name sorted, and has selected only paired mapped reads, as well as the markdup stats file.

for sorted in *.sorted.bam
do
    stats=${sorted/.sorted.bam/.markdup.stats}
    pairfilter=${sorted/.sorted.bam/.markdup.sorted.pairs.bam}
    samtools fixmate -m $sorted - | \
    samtools sort -@ 2 -o - - |\
    samtools markdup -r -s -f $stats - - |\
    samtools sort -@ 2 -n -o - - |\
    samtools view -bf 1 - > $pairfilter
done

Thank you!

Edit: for reference, the error file from my job submission as is reports this:

Failed to open file "Gerson-11_paired_pec.markdup.stats" : No such file or directory
samtools markdup: failed to open "Gerson-11_paired_pec.markdup.stats" for input: No such file or directory
samtools sort: failed to read header from "-"
[main_samview] fail to read the header from "-".

samtools bam slurm • 1.6k views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 2.6 years ago by Cory ▴ 10

0

Entering edit mode

Most likely samtools markdup -r -s -f $stats - - require a pre existing $stats . May be add a touch $stats to create the file before all the samtools commands. I am not 100% sure but will be curious to know if this is in fact the case.

ADD REPLY • link 2.6 years ago by sure ▴ 100

0

Entering edit mode

Thanks for the reply! That's a good point, but it didn't seem to be flagging it. I'm wondering if something was wrong with the error reporting because it gave me a different error output when I re-ran the shell script after removing it and re-uploading it to the cluster. Then it showed that the error was a lack of prefix in the sort function.

ADD REPLY • link 2.6 years ago by Cory ▴ 10

score 1 · Accepted Answer · 2022-04-15

I fiddled around a bit and it looks like one issue was that I didn't provide a prefix for the temporary search files. I'm not sure why that hadn't shown up as an error, but this code seems to be working:

for sorted in *.sorted.bam do stats=${sorted/.sorted.bam/.markdup.stats} pairfilter=${sorted/.sorted.bam/.markdup.sorted.pairs.bam} samtools fixmate -m $sorted - | \ samtools sort -T TMP1 - |\ samtools markdup -r -s -f $stats - - |\ samtools sort -n -T TMP1 - |\ samtools view -bf 1 - > $pairfilter done