Hello,
I am attempting to run a pipeline https://github.com/UMCUGenetics/SHARC through a docker container on my M1 Macbook and am running into some issues. The packages were created using GNU-Guix and am wondering if running these packages on my Mac operating system is the source of error. While working with the author, we ran the same docker command with the same data and file mounting which is as follows:
sudo docker run --platform linux/amd64 --mount type=bind,source=/Users/waples.lab/Desktop/bio/reference_data,destination=/tmp/data/ --mount type=bind,source=/Users/waples.lab/Desktop/bio/output,destination=/tmp/output/ -it jaesvi/sharc -f /tmp/data/ -o /tmp/output/ -mr /tmp/data/hg19.fa -mhr 4:0:0
Yet I am not able to get the same desired output as he. The only difference in our commands is I specify the --platform flag to linux/amd64, as the M1 Macbook is to my understanding, an arm64 achitecture. I have also tried to omit the flag but I still get the same error.
From the logs, it appears that the error occurs in the mapping portion of the pipeline; the code is through this link:
https://github.com/UMCUGenetics/SHARC/blob/master/steps/minimap2.sh
CMD="$MINIMAP2 -t $THREADS $SETTINGS $REF $FASTQ | \
$SAMBAMBA view -h -S --format=bam -t 8 /dev/stdin | \
$SAMBAMBA sort -m9G -t $THREADS --tmpdir=./ /dev/stdin \
-o $OUTPUT
echo $CMD
eval $CMD
if [ -e $OUTPUT ]; then
NUMBER_OF_READS_IN_FASTQ=$(awk '{s++}END{print s/4}' $FASTQ)
NUMBER_OF_READS_IN_BAM=$($SAMBAMBA view $OUTPUT | cut -f 1 | sort | uniq | wc -l)
if [ "$NUMBER_OF_READS_IN_FASTQ" == "$NUMBER_OF_READS_IN_BAM" ]; then
touch $OUTPUT.done
else
echo "Number of reads in the fastq file ($NUMBER_OF_READS_IN_FASTQ) is different than the number of reads in the bam file ($NUMBER_OF_READS_IN_BAM)" >&2
fi
fi
I figured that the error occurs somewhere in the above portion of code, where I get the error in my log:
Number of reads in the fastq file (187182) is different than the number of reads in the bam file (0)
Could anyone shine some light on whether this issue is due to compatibility issues due to running a docker image built on linux binary with a macbook pro, or an issue with the script that renders a mac unable to use it? The script progresses through til the end, but I am left with no output and I also get an error message in the beginning "basename: missing operand" Thank you in advance for your help, anything is appreciated!
what is the error that you are getting?
The error is:
Number of reads in the fastq file (187182) is different than the number of reads in the bam file (0)
I am assuming this occurs because minimap2 did not execute properly. What I should have got in my job log is:
is the input data for this test run included in the docker image or are you downloading that separately? If the latter, I suggest checking the number of line in the file or the md5 checksum to ensure that all of the file was downloaded completely
I have downloaded the input data separately and mounted it to the Docker container with bind mounts. The files are downloaded completely and correctly. I have shared the same files (fastq and reference genome) that I have mounted to the docker container with the author of the pipeline, and he was able to run and get the correct output. This leaves me to wonder if the issue lies with the cross-compatibility of different operating systems. We ran the same docker command, but the only difference is he is using Linux, whereas my machine runs Mac OS.