Hi Everyone,
Over a year ago I asked this question: Constructing Compute Resources In Support Of Ngs In A Hospital Setting about NGS and compute requirements people were thinking about at the time. A lot has changed in the year and a half since I visited the topic, and in particular NGS in the clinic is something that is far more common. So I thought it worth revisiting the question in general, but with a new question.
Thoughts and opinions, particularly people with some knowledge of NGS in health care would be most welcome. I'll preface it by saying that I just got this new job, so starting next week I will officially be the Clinical Bioinformatician for part of the healthcare system here in Canada. The DNA diagnostics lab has purchased two MiSeqs (so they have redundancy) and I'll be in charge of compute, data handling, analysis, etc. Most of what we hear about in terms of data storage/compute in relation to NGS is really geared towards larger sequencing centers (on the large end) or individual labs on the small end. While the number of samples you'll run has a huge impact on the amount of data that will be generated, I am guessing there are some people out there with some more direct experience.
Cloud solutions are a no go here in Canada with our Healthcare system so everything must be on site. Budget also isn't large so it is a trade off between storage size, redundancy, and compute speeds/capacity. Ultimately I think archiving will need to be handled through magnetic tape or archival optical disk (or this new M-Disc I have seen recently), because I don't think we will be in a position to support archival storage on hard disk if we are obligated to keep records and tests for the same length of time as a Pathology report (20-25 years).
So what are some innovative solutions you would consider? Especially to keep costs as low as possible? If you are currently supporting a MiSeq in the clinic what are you doing?
Hopefully we can have some good discussion.
One thing you should look at for storage is NGS alternatives to gzip/bzip. I don't have the numbers, but I did some testing on non-reference methods a while ago and they offer better compression ratios than gzip/bzip. Supposedly, you can get even better compression ratios using algorithms which use a reference genome.