Hi,
I'm planning to sequence bacterial genomes using both Nanopore and Illumina platforms to get nearly complete and accurate genomes. After getting sequence data, I'm going to perform hybrid assembly (e.g. Unicycler) or long-read assembly followed by short-read polishing.
In this case, can I get some advantages by using high accuracy models (e.g. dna_r9.4.1_450bps_hac.cfg) for guppy basecalling? Or, would fast models (e.g. dna_r9.4.1_450bps_fast.cfg) be enough?
I'm testing the speed performance of guppy on my Ubuntu 18.04 machine equipped with GTX1660 (Cuda 10.1), and it seems that fast models are much faster than high accuracy models (>10X).
Thanks.
Thank you for a reply.
I also feel that hac models would be better for downstream analyses. I think I had better spend more time to optimize guppy parameters. Currently, hac model is >10X (GPU) or >20X (CPU) slower than fast model on my machine.
I haven't optimised at all really yet apart from the obvious (CPU only).
I might be wrong as I haven't done comparative analysis of different setups, but we should do more testing.
I split the fast5 files into groups of 5 per subdir, then submit each folder to a slurm cluster. I specify 10 slurm threads but set $cpus in the code below to 8.
The code is at https://github.com/colindaven/guppy_on_slurm
If you can optimize further please let me know.
I've been trying to optimize several parameters for guppy.
Until now, the setting below was the fastest for GPU, but the speed improvement was just ~20-25% compared to default.
$ guppy_basecaller -i /fast5 -s /guppy -c dna_r9.4.1_450bps_hac.cfg -x "cuda:0" --gpu_runners_per_device 4 --num_callers 4 --chunks_per_runner 2048
I haven't tried different settings for CPU, because GPU with default was a little bit faster than CPU even when I used 72 threads (among 80 threads available from dual Xeon Gold 6230). Unfortunately, I have no access to a cluster.