Hi all,
Can anyone explain the differences in output I am getting when running BBduk on my local server and on the High Performance Computer?
See my comparative shell script below, including observations.
# These tests were run in directory $trimming/bbduk/cluster, where results from HPC are compared to local server.
# In "{,../}", "" thus stands for HPC and "../" stands for local server.
## Compare file sizes
ls -l $(realpath ../test_paired_R[12].fq.gz) > compare_ll.txt
ssh user@HPC "ls -l $trimming/bbduk/test_paired_R[12].fq.gz" >> compare_ll.txt
#> Unexpectedly, file sizes differ.
## Compare logs
scp user@HPC:$trimming/bbduk/test_paired.log ./
diff --side-by-side --width=$COLUMNS {,../}test_paired.log > diff_logs.sh
#> No change in Results.
## Perhaps due to zipping?
scp user@HPC:$trimming/bbduk/test_paired_R[12].fq.gz ./
for gz in {,../}test_paired*.fq.gz; do gunzip -v -c $gz > ${gz%.gz} & done
#> Quick unzipping script
ll {,../}test_paired_R[12].fq > compare_ll_unzipped.txt
wc -c {,../}test_paired_R[12].fq
#> Now all have same size?!
less {,../}test_paired*.fq
#> R1s look identical, but differ from R2s.
cmp {,../}test_paired_R1.fq
cmp {,../}test_paired_R2.fq
#> Both R1s and R2s differ within group around line 800.
#>> Why do they have the same size?!
#>> Why do they differ, but differ so subtly?!
My colleagues have proposed some explanations for the zip sizes differing between local server and HPC, such as different block sizes and different compressing options. I'm mostly interested in the difference between the output however. Does BBduk incoorporate a non-deterministic component or is there something else at play here?
If someone has a solution or a proposed analysis for answering this question, please let me know!
Brian Bushnell and genomax, do you maybe have suggestions for solving this?
Exactly the same version of bbtools on both machines?
Yes, 38,79 I think. However, the conda environment differed in other aspects. So same software version, but not identical conda environment.