Hello all,
I am hoping for help running the oases_pipeline.py
. I am able to run the pipeline for a subset of my data and for 4 Kmer values using the following code:
python oases_pipeline.py -m 21 -M 27 -s 2 -o oases_test -d '-fastq -shortPaired -separate trimm_15_F_paired.fq trimm_15_R_paired.fq' -p '-ins_length 160'
This runs successfully and produces output for K 21 through 27. However, when I try to run the following code on my full dataset (538769828 sequences) and for a larger range of K, it fails. This is my input code:
python oases_pipeline.py -m 21 -M 51 -s 2 -o oases_ALL -d '-fastq -shortPaired -separate ALL_F_paired.fq ALL_R_paired.fq' -p '-ins_length 160'
This command runs successfully for K=21, but then crashes on K=23 with this output:
[5141.379366] Inputting sequence 66000000 / 538769828
[5163.243577] Inputting sequence 67000000 / 538769828
[5170.824776] === Sequences loaded in 997.337692 s
[5171.829179] Done inputting sequences
[5171.829187] Destroying splay table
[5173.870477] Splay table destroyed
[5175.177294] Command failed!
[5175.177304] rm -f oases_ALL_23/Sequences
Hash failed
I am at a loss for why it will run for a subset of data and for the first K, but crash on the second.
Many thanks in advance for any input!