Hello!
I'm a PhD student in Paris working in the field of Cellular Biology and bioinformatics, and my lab head and I are looking to set up local network-based backup drives for the lab's computers (mainly Macs) for all ouf our snRNA-Seq datasets. I was considering setting up a NAS or something similar, but I'm still quite unfamiliar with the different options and the possible redundancies we could implement. It would concern between 7 to 10 computers.
I would greatly appreciate any advice or information on the options that might suit our needs. Thank you!
Thank you all for your answers and diverse points of view.
While I agree with what GenoMax is saying, I happen to be the most tech-savvy individual in the lab at the moment, and I actually enjoy handling these tasks. Our institute has mentioned plans to implement an online storage system, but it's been a year, and we haven’t heard a single word of progress. Additionally, the head of IT literally cannot deal with macOS—he asks me to input the password because he "doesn't do Macs" and even claimed that the power brick is where the internals of the latest iMac are located, which I found baffling coming from the head of IT. Connecting it to the network is easy enough.
So, essentially, it's either we handle it ourselves or don't have it at all. We simply want an automated, network-based backup system in case the original files on the computers are somehow lost.
My biggest concern isn't losing data, but rather spending lab funds on a system that doesn’t function as we’d like. That’s why I’m doing as much research as possible before committing.
I’ll go for a RAID 5 system. If anyone has recommendations on the models and software to use, please do share.
Unless you are really tech savvy don't plan to roll your own (one certainly can). Go with Synology/QNAP (search for these companies and you will find their products) .. you get the idea.
What is your budget?
As in you want the system to mount as a share on local desktops or you want the local systems to back data up to the NAS automatically? A NAS file share as opposed to a plain data backup server would be subtly different applications.
I would suggest thinking in reverse: NAS with RAID is a place to store primary data/metadata. Data on the workstations should be just the copy (or a mounted partition) of what is properly organized into a neat folders and resides on NAS => one point of reference, no digging into naming conventions end users may have.
Then on the top of it you should consider backup of that NAS, be it Amazon Glacier another cloud/storage service.
re primary/raw data:
enforce some naming conventions, suffixes, metadata, checksums. Finding stuff with some generic names obtained from sequencing core in random folders makes no sense. Tag/update info about the data already deposited in ENA/SRA.
I agree with both of the other comments. I can't remember which system we had in our lab, but it was from a company with clear instructions. Looking at Synology, I am pretty sure it was from them. It was pretty easy to set up and we never had any major issues. I believe we did have to get IT to make the ethernet port we had it connect to accessible from the local machines.
We had it mounted on all of the local machines and had a strict schema for putting data on it, as Darked89 suggests. Enforce it strongly and early, as things will cascade into a mess if you don't. Your PI should make you the de facto czar of the system and make that clear to everyone who will be using it. Also be sure the notifications for failed backups and failing disks have some redundancy and let multiple people know.