Question

Speeding up minimap2

1

Entering edit mode

2.1 years ago

colindaven 7.0k

Dear all,

has anyone done any benchmarking on speeding up long read alignment algorithms ?

I mainly use minimap2, but its' runtime varies by a factor of 10 across our cluster. I've been trying mm2-fast https://github.com/bwa-mem2/mm2-fast, the partially accelerated version, but without much success so far.

Is for example PAF output faster than SAM ?

Have others worked out how to scale minimap2 for Promethion scale datasets ? I expect LRA https://github.com/ChaissonLab/LRA has a similar runtime from their presented results, and others seem slower still (ngmlr etc).

Thanks

alignment nanopore efficiency longread • 3.1k views

ADD COMMENT • link 2.0 years ago by colindaven 7.0k

1

Entering edit mode

You are already using multiple threads and are asking for an additional speedup? You could split the data files up and start multiple jobs in parallel as a sledgehammer solution. I have never worked with Promethion size data so don't have a direct insight there.

ADD REPLY • link 2.1 years ago by GenoMax 147k

0

Entering edit mode

The input splitting is an interesting idea, which comes at the expense of using far more CPU resources. I'll give this a try but also benchmark with some other options.

edit: I am using 24 threads as I have found that to be fastest on my infrastructure when using hyperfine for benchmarking.

Thanks

ADD REPLY • link 2.0 years ago by colindaven 7.0k

score 3 · Accepted Answer · 2022-11-07

3

Entering edit mode

2.0 years ago

shelkmike ★ 1.4k

1) Yes, when Minimap2 makes paf files it is faster than when it makes sam files.
2) For a speedup at cost of accuracy you can increase the minimizer length ("-k") and window length ("-w").
3) You can increase "-I". If the reference is larger than 4 Gbp, this will accelerate Minimap2 at cost of increased RAM consumption.

Also, see https://github.com/lh3/minimap2/issues/322

ADD COMMENT • link 2.0 years ago by shelkmike ★ 1.4k

0

Entering edit mode

For a speedup at cost of accuracy you can increase the minimizer length ("-k") and window length ("-w").

Exactly, I wouldn't do that if the sequencing reads come from PacBio (CLR) or oxford nanopore.

ADD REPLY • link 2.0 years ago by Buffo ★ 2.4k

0

Entering edit mode

By default, for Nanopore reads Minimap2 uses -k 15 -w 10. This combination of parameters was intended for old Nanopore reads which often had an average accuracy below 90%. I think that for modern Nanopore reads, which have an average accuracy >95%, a user can increase -k and -w to some extent without compromising the alignment accuracy.

ADD REPLY • link 2.0 years ago by shelkmike ★ 1.4k

0

Entering edit mode

Yes, I know. My point is that the quality of the data not only depends on the sequencing chemistry, it also includes the sequencing performance, the library preparation, the input DNA, etc. I would increase the -k for HiFi reads and/or when the coverage is high.

ADD REPLY • link 2.0 years ago by Buffo ★ 2.4k