Question

How much resources does HiCanu need to assemble PacBio sequence

0

Entering edit mode

19 months ago

Ananthu • 0

Hi, I am planning to assemble a diploid genome using pacBio reads. We have 900GB of raw reads. The genome size is ~1 Gbps. What resources do we need to get this done? How much RAM and how many CPUs do we need. We have HPC nodes with >100 GB ram and 26 CPUs. Will this be sufficient? Thank you.

HiCanu Genome assembly • 1.1k views

ADD COMMENT • link updated 19 months ago by Dave Carlson ★ 2.0k • written 19 months ago by Ananthu • 0

0

Entering edit mode

You may consider reaching out to the developers behind the program: https://github.com/marbl/canu

ADD REPLY • link 19 months ago by kalavattam ▴ 280

score 0 · Answer 1 · 2023-04-15

Since you have access to an HPC cluster, this should not be a problem. You can run Canu with useGrid=true, and Canu will parallelize the analysis across multiple nodes using your cluster's job scheduler.

You can additionally supply flags to control how many threads and how much memory is used per process to ensure that the CPU and memory limits of your nodes are not exceeded. See more here:

https://canu.readthedocs.io/en/latest/tutorial.html#execution-configuration

I've done assembly of a ~1 GB, highly repetitive genome using Canu on a cluster with similar specs to the one you're describing using this strategy.