Speeding up minimap2
1
1
Entering edit mode
2.1 years ago

Dear all,

has anyone done any benchmarking on speeding up long read alignment algorithms ?

I mainly use minimap2, but its' runtime varies by a factor of 10 across our cluster. I've been trying mm2-fast https://github.com/bwa-mem2/mm2-fast, the partially accelerated version, but without much success so far.

Is for example PAF output faster than SAM ?

Have others worked out how to scale minimap2 for Promethion scale datasets ? I expect LRA https://github.com/ChaissonLab/LRA has a similar runtime from their presented results, and others seem slower still (ngmlr etc).

Thanks

alignment nanopore efficiency longread • 3.1k views
ADD COMMENT
1
Entering edit mode

You are already using multiple threads and are asking for an additional speedup? You could split the data files up and start multiple jobs in parallel as a sledgehammer solution. I have never worked with Promethion size data so don't have a direct insight there.

ADD REPLY
0
Entering edit mode

The input splitting is an interesting idea, which comes at the expense of using far more CPU resources. I'll give this a try but also benchmark with some other options.

edit: I am using 24 threads as I have found that to be fastest on my infrastructure when using hyperfine for benchmarking.

Thanks

ADD REPLY
3
Entering edit mode
2.1 years ago
shelkmike ★ 1.4k

1) Yes, when Minimap2 makes paf files it is faster than when it makes sam files.
2) For a speedup at cost of accuracy you can increase the minimizer length ("-k") and window length ("-w").
3) You can increase "-I". If the reference is larger than 4 Gbp, this will accelerate Minimap2 at cost of increased RAM consumption.

Also, see https://github.com/lh3/minimap2/issues/322

ADD COMMENT
0
Entering edit mode

For a speedup at cost of accuracy you can increase the minimizer length ("-k") and window length ("-w").

Exactly, I wouldn't do that if the sequencing reads come from PacBio (CLR) or oxford nanopore.

ADD REPLY
0
Entering edit mode

By default, for Nanopore reads Minimap2 uses -k 15 -w 10. This combination of parameters was intended for old Nanopore reads which often had an average accuracy below 90%. I think that for modern Nanopore reads, which have an average accuracy >95%, a user can increase -k and -w to some extent without compromising the alignment accuracy.

ADD REPLY
0
Entering edit mode

Yes, I know. My point is that the quality of the data not only depends on the sequencing chemistry, it also includes the sequencing performance, the library preparation, the input DNA, etc. I would increase the -k for HiFi reads and/or when the coverage is high.

ADD REPLY

Login before adding your answer.

Traffic: 2667 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6