Entering edit mode
6.3 years ago
kristina.mahan
▴
170
I keep trying to de novo assembly this 270 Mb genome and I think the assembler just stops and doesn't finish. Why?
srun -c 100 -t 5000-12 python2 spades.py -1 ~/projects/algae/sequencing/KMM5_analysis/KMM5_output_forward_paired_trimmed -2 ~/projects/algae/sequencing/KMM5_analysis/KMM5_output_reverse_paired_trimmed -o ~/projects/algae/sequencing/KMM5_analysis/KMM5_spades_careful_assembly -t 100 -k 21,33,55,77 --careful
When I check the spades.log here's what the last part says:
3:47:15.147 166G / 171G INFO K-mer Counting (kmer_data.cpp : 382) Processing /home/kmmahan/projects/algae/ABY2/sequencing/KMM5_analysis/KMM5_output_forward_paired_trimmed.fastq
4:14:22.137 166G / 171G INFO K-mer Counting (kmer_data.cpp : 382) Processing /home/kmmahan/projects/algae/ABY2/sequencing/KMM5_analysis/KMM5_output_reverse_paired_trimmed.fastq
4:39:41.384 166G / 171G INFO K-mer Counting (kmer_data.cpp : 389) Collection done, postprocessing.
4:40:15.314 166G / 171G INFO K-mer Counting (kmer_data.cpp : 403) There are 5499739764 kmers in total. Among them 4071657118 (74.0336%) are singletons.
4:40:15.314 166G / 171G INFO General (main.cpp : 173) Subclustering Hamming graph
How do you know when the assembly is finished? I still have tmp files and I read that the tmp files would disappear once the assembly is done. Any advice would be appreciated.
The
spades.log
is incomplete, indicating that the operating system or queue manager killed your SPAdes run. As you had plenty of time left, I am guessing you reached the memory limit available. As you didn't specify an output file for SLURM,stdout
andstderr
should have been redirected to the terminal running SPAdes. Did you check the terminal for messages?I think Spades emits a specific “Thanks for using Spades” type message when the run completes fully - so you’ll know unambiguously when it’s done (I wish more software did that actually).
Can you post more info about the resources available to you? (RAM etc). At the moment I’d be inclined to agree with h.mon that you’ve likely run out of memory.
I didn't see any messages in the terminal. Should I specific -m in my command and how much should I specifiy. -m 500?
free -m
Total: 3095999 Used: 19999 Free: 1710756 Shared: 21 Buff/cash: 1365242 Available: 3073464
Ask your system administrator about queues available, queue memory limits and how to ask SLURM for more memory. I would also use
sbatch
instead ofsrun
.23:03:07.547 178G / 178G INFO General (hammer_tools.cpp : 175) Processed batch 34
23:03:15.307 178G / 178G INFO General (hammer_tools.cpp : 185) Written batch 34
23:03:25.251 166G / 178G INFO General (hammer_tools.cpp : 274) Correction done. Changed 131497146 bases in 83092063 reads.
23:03:25.252 166G / 178G INFO General (hammer_tools.cpp : 275) Failed to correct 401377 bases out of 100362196131.
23:03:25.971 400M / 178G INFO General (main.cpp : 255) Saving corrected dataset description to /home/kmmahan/projects/algae/ABY2/sequencing/KMM5_analysis/KMM5_spades_careful_assembly_09282018/corrected/corrected.yaml
23:03:25.973 400M / 178G INFO General (main.cpp : 262) All done. Exiting.
== Compressing corrected reads (with gzip)
It didn't say "Thanks for using Spades" so does that mean it's still going? Maybe this was just the error correction piece? I set -m 1000000 and that seems to be working so far. Thanks!
It’s been a while since I’ve run it so I’ll have to double check. What version of spades are you on?
SPAdes 3.12.0. The assembly is actually running now. So I bet it will still say "Thanks for using Spades" when it's done. Thanks alot!