Question

Which network file system to use?

0

Entering edit mode

17 months ago

ResearchR ▴ 120

Dear Biostars-Community,

our beautiful SLURM-Cluster is attached to NAS-System (TrueNAS Scale, which uses ZFS under the hood). To keep scientific life as clean as possible, we are using a shared file system to provide project related data as well as home directories (all SLURM nodes run on Ubuntu 22.04 LTS). This is where the fun begins. Currently, we are mounting remote file systems using the SMB protocol, which is horribly slow when it comes to writing and reading a large amount of small files (installing conda took an overwhelming 12 min!). We already tuned it to the best of our knowledge, but still not as performant as we would like it to be (especially given the fact that our internal network runs 10 Gbit/s or more). We also tried NFS, same issue. To take the distributed nature of a cluster system into account, we also experimented with CEPH as well as GlusterFS. Since these systems are distribuited, our overall storage capacity wil be diminished (solvable problem!), but and this is what surprised me, neither CEPH nor GlusterFS was faster by an outstanding magnitude. Since filesystems are not my forte, I would not be surprised to have overlooked something. Any suggestions on that topic from your side?

Thanks in advance and cheers!

NFS GlusterFS Filesystem Ceph SMB • 1.1k views

ADD COMMENT • link updated 17 months ago by colindaven 7.0k • written 17 months ago by ResearchR ▴ 120

0

Entering edit mode

Currently, we are mounting remote file systems using the SMB protocol, which is horribly slow when it comes to writing and reading a large amount of small files

Sounds to me like there is some kind of hardware bottleneck. Is 10G ethernet being used end to end? Are the file systems in the same VLAN and/or are they being scanned by a deep packet inspection device? If yes, then getting them exempted from scans would lead to performance improvement.

Ultimately it may be your NAS hardware that is the issue. If the storage head nodes are tapped out in terms of performance getting larger nodes may be the only solution.

ADD REPLY • link 17 months ago by GenoMax 147k

0

Entering edit mode

I think these issues are best addressed by local people who had already done similar things. It is something to know and plan ahead of time rather than hoping for an advice from strangers on the internet.

ADD REPLY • link 17 months ago by Mensur Dlakic ★ 28k

score 1 · Answer 1 · 2023-06-13

1

Entering edit mode

17 months ago

colindaven 7.0k

beegfs is worth a look for your use case if you have a lot of SSD and the right storage architectural possibilities.

netapp and isilon are nice as well (with commercial support) but cost big $$$$$$

Conda can be slow, yes, but caching works pretty well and they have improved the solver a lot in the past year or so. Mamba is an interesting alternative, with micromamba also great in containers, but suffers from random unsolvable crashes in my experience.

ADD COMMENT • link 17 months ago by colindaven 7.0k

1

Entering edit mode

beegfs is really nice! However, the problem with the crazy amount of small files still persists. I managed to talk to some system architect guys...all I got was that this issue is well known and with the advance of ML/ AI the problem just became more relevant.

ADD REPLY • link 17 months ago by ResearchR ▴ 120

0

Entering edit mode

Try making very nice and easy tar.gz scripts available to all users. And display/explain to all users.

User behaviour is a very difficult problem to solve.

ADD REPLY • link 17 months ago by colindaven 7.0k