Question

Are there advantages in using high accuracy models for guppy basecaller when Illumina data are available?

0

Entering edit mode

5.6 years ago

ikangkim ▴ 50

Hi,

I'm planning to sequence bacterial genomes using both Nanopore and Illumina platforms to get nearly complete and accurate genomes. After getting sequence data, I'm going to perform hybrid assembly (e.g. Unicycler) or long-read assembly followed by short-read polishing.

In this case, can I get some advantages by using high accuracy models (e.g. dna_r9.4.1_450bps_hac.cfg) for guppy basecalling? Or, would fast models (e.g. dna_r9.4.1_450bps_fast.cfg) be enough?

I'm testing the speed performance of guppy on my Ubuntu 18.04 machine equipped with GTX1660 (Cuda 10.1), and it seems that fast models are much faster than high accuracy models (>10X).

Thanks.

Assembly genome sequencing • 6.9k views

ADD COMMENT • link 5.6 years ago by ikangkim ▴ 50

score 1 · Accepted Answer · 2019-12-12

1

Entering edit mode

5.6 years ago

colindaven 7.7k

Bacterial genomes seem to have a higher accuracy than vertebrate genomes in my limited experience (maybe fresher, more higher quality DNA?) in both fast and hac modes.

I would do hac mode followed by Illumina polishing, Maybe you only get a 1% accuracy increase with respect to fast mode, but 1% is worth having and is going to cause a LOT less problems downstream.

My speed comparisons indicate a ~7X difference between fast and hac modes on CPU.

ADD COMMENT • link 5.6 years ago by colindaven 7.7k

0

Entering edit mode

Thank you for a reply.

I also feel that hac models would be better for downstream analyses. I think I had better spend more time to optimize guppy parameters. Currently, hac model is >10X (GPU) or >20X (CPU) slower than fast model on my machine.

ADD REPLY • link 5.6 years ago by ikangkim ▴ 50

0

Entering edit mode

I haven't optimised at all really yet apart from the obvious (CPU only).

I might be wrong as I haven't done comparative analysis of different setups, but we should do more testing.

I split the fast5 files into groups of 5 per subdir, then submit each folder to a slurm cluster. I specify 10 slurm threads but set $cpus in the code below to 8.

The code is at https://github.com/colindaven/guppy_on_slurm

# high accuracy, 7x + slower (40+ hours)
guppy_basecaller -i $i  -s $i.guppy --cpu_threads_per_caller 1 --num_callers $cpus -c dna_r9.4.1_450bps_hac.cfg
# fast, lower accuracy, 7x + faster (6hours?)
# guppy_basecaller -i $i  -s $i.guppy --cpu_threads_per_caller 1 --num_callers $cpus -c dna_r9.4.1_450bps_fast.cfg

If you can optimize further please let me know.

ADD REPLY • link 5.6 years ago by colindaven 7.7k

0

Entering edit mode

I've been trying to optimize several parameters for guppy.

Until now, the setting below was the fastest for GPU, but the speed improvement was just ~20-25% compared to default.

$ guppy_basecaller -i /fast5 -s /guppy -c dna_r9.4.1_450bps_hac.cfg -x "cuda:0" --gpu_runners_per_device 4 --num_callers 4 --chunks_per_runner 2048

I haven't tried different settings for CPU, because GPU with default was a little bit faster than CPU even when I used 72 threads (among 80 threads available from dual Xeon Gold 6230). Unfortunately, I have no access to a cluster.

ADD REPLY • link 5.6 years ago by ikangkim ▴ 50