Seeking Advice: Best Network-based Backup Solutions (NAS or Alternatives) for a Small Lab
3
1
Entering edit mode
10 weeks ago
phhelou5 ▴ 10

Hello!

I'm a PhD student in Paris working in the field of Cellular Biology and bioinformatics, and my lab head and I are looking to set up local network-based backup drives for the lab's computers (mainly Macs) for all ouf our snRNA-Seq datasets. I was considering setting up a NAS or something similar, but I'm still quite unfamiliar with the different options and the possible redundancies we could implement. It would concern between 7 to 10 computers.

I would greatly appreciate any advice or information on the options that might suit our needs. Thank you!

dataset Backup Hardware NAS • 822 views
ADD COMMENT
0
Entering edit mode

Thank you all for your answers and diverse points of view.

While I agree with what GenoMax is saying, I happen to be the most tech-savvy individual in the lab at the moment, and I actually enjoy handling these tasks. Our institute has mentioned plans to implement an online storage system, but it's been a year, and we haven’t heard a single word of progress. Additionally, the head of IT literally cannot deal with macOS—he asks me to input the password because he "doesn't do Macs" and even claimed that the power brick is where the internals of the latest iMac are located, which I found baffling coming from the head of IT. Connecting it to the network is easy enough.

So, essentially, it's either we handle it ourselves or don't have it at all. We simply want an automated, network-based backup system in case the original files on the computers are somehow lost.

My biggest concern isn't losing data, but rather spending lab funds on a system that doesn’t function as we’d like. That’s why I’m doing as much research as possible before committing.

I’ll go for a RAID 5 system. If anyone has recommendations on the models and software to use, please do share.

ADD REPLY
0
Entering edit mode

Unless you are really tech savvy don't plan to roll your own (one certainly can). Go with Synology/QNAP (search for these companies and you will find their products) .. you get the idea.

What is your budget?

We simply want an automated, network-based backup system

As in you want the system to mount as a share on local desktops or you want the local systems to back data up to the NAS automatically? A NAS file share as opposed to a plain data backup server would be subtly different applications.

ADD REPLY
0
Entering edit mode

I would suggest thinking in reverse: NAS with RAID is a place to store primary data/metadata. Data on the workstations should be just the copy (or a mounted partition) of what is properly organized into a neat folders and resides on NAS => one point of reference, no digging into naming conventions end users may have.
Then on the top of it you should consider backup of that NAS, be it Amazon Glacier another cloud/storage service.

re primary/raw data:
enforce some naming conventions, suffixes, metadata, checksums. Finding stuff with some generic names obtained from sequencing core in random folders makes no sense. Tag/update info about the data already deposited in ENA/SRA.

ADD REPLY
0
Entering edit mode

I agree with both of the other comments. I can't remember which system we had in our lab, but it was from a company with clear instructions. Looking at Synology, I am pretty sure it was from them. It was pretty easy to set up and we never had any major issues. I believe we did have to get IT to make the ethernet port we had it connect to accessible from the local machines.

We had it mounted on all of the local machines and had a strict schema for putting data on it, as Darked89 suggests. Enforce it strongly and early, as things will cascade into a mess if you don't. Your PI should make you the de facto czar of the system and make that clear to everyone who will be using it. Also be sure the notifications for failed backups and failing disks have some redundancy and let multiple people know.

ADD REPLY
2
Entering edit mode
10 weeks ago

We used a RAID 5 setup with a NAS in my graduate lab for such data. It worked well enough, though keeping it connected to the network was a bit of a pain at times. You may have to work with your IT department to get it set up such that it's easily accessible from all computers.

No loss of data when a disk actually failed was nice though. Relatively cheap to keep going, most of the cost is upfront.

I'd also consider cloud storage at this point depending on how much data you have, how accessible it needs to be, etc.

ADD COMMENT
1
Entering edit mode
10 weeks ago
Darked89 4.7k

No matter which system you will use the first thing is to select stuff you can not survive without (primary data, metadata, data analysis pipelines, etc.) and make sure you do not have X perfect or not so perfect copies of these on Y laptops/workstations/servers mixed up with stuff which can and should be cleaned up after the analysis is done. Also: say no to giant tar/tar.gz files especially with mixed content, tar files inside tar files, extra level of compression of already compressed FASTQ or BAM files. Backups are of limited use if restoring crucial files takes forever.

ADD COMMENT
0
Entering edit mode

Thank you I hadn't thought of that!

ADD REPLY
1
Entering edit mode
10 weeks ago
GenoMax 147k

I'm a PhD student

If you are trying to do this then leave it to professionals (if you can), especially if you are an experimental biologist. While it is tempting to work on and can be very interesting, it will take you away from your research, require more time to configure/set up/administer. You will basically be taking responsibility of data that belongs to the lab and any mishaps will be blamed squarely on you. But on the other hand you will get zero thanks for doing the job, if all goes well.

There is also institutional security/administrative/IT policies to consider. No institution will allow devices to connect/remain on the network if the relevant policies are not followed. So take that into consideration before you go too far down this road.

There are turn-key solutions that are out there for purchase (synology is one example, if your data needs are relatively small) to get up and running with minimal effort (but with a premium added to price).

ADD COMMENT
0
Entering edit mode

While in a perfect world research institutes would have well staffed IT support, this is often not the case. So "part time system admins" recruited among group members can be spotted in the wild. I share the general sentiment: more work & responsibility, often no glory, never extra $$$. On the other hand it is an opportunity to learn and hopefully improve things.

The important thing is to select essential data/duties (like storing primary data) and reject the silly idea of becoming an unpaid servant for everyone in the group i.e. searching / retrieving version.xyz of some Word doc. I would do not touch backups of workstations with a barge pole...

ADD REPLY
1
Entering edit mode

The important thing is to select essential data/duties (like storing primary data) and reject the silly idea of becoming an unpaid servant for everyone in the group i.e. searching / retrieving version.xyz of some Word doc. I would do not touch backups of workstations with a barge pole...

This is a great point. People should be explicitly putting data on whatever system you set up - and that's the only stuff that gets backed up. It should be clear that it's only for set purposes. Docs and presentations and such are best left to backups that sync to a cloud service.

I also agree with GenoMax that it's best left to professionals, but I have also been in situations where that's just not going to happen.

ADD REPLY

Login before adding your answer.

Traffic: 2520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6