Hardware spec for exome sequencing?
1
0
Entering edit mode
9.0 years ago
biogennw • 0

I am looking at setting up a focused exome sequencing service (human) in an institute without any supercomputer/cluster facilities. I am not sure if I should set up a small cluster or use a dedicated bioinformatics computer. I was hoping someone could give me an idea of the specs I would need for both options?

sequence next-gen-sequencing • 2.7k views
ADD COMMENT
0
Entering edit mode

how much money do you have to spend?

ADD REPLY
0
Entering edit mode

Only between £5000 and £8000, but the cheaper the better really!

ADD REPLY
0
Entering edit mode

Even a small cluster is pretty much a non-starter at that price point. There are some 4 node units that you might be able to hit at that price point but you're going to be short on storage to go with it. Like I said below what sequencer you're looking to support and what throughput you will be doing exomes at is what really makes the difference. With that information we could probably start suggesting some specific options that will fit within the budget

ADD REPLY
1
Entering edit mode
9.0 years ago
DG 7.3k

Really it boils down to three things: What sequencer are you supporting, what throughput are you estimating in terms of number of sequences sampled per month/year, and what is your budget (even just as a ballpark)? Since you said focused Exome sequencing (human?) I am assuming you are planning on supporting a NextSeq or similar medium-range sequencer? If you are talking anything larger there is no question you absolutely need a cluster.

I spent a few years doing all of the exome analysis and bioinformatics on a large set of exome sequencing projects. Usually we were doing about 6-12 exomes every 4-6 months or so depending on patient recruitment from different families. I did all of it on a single dedicated workstation (8 cores, 50GB of RAM) and it worked pretty well except when I had 12 exomes that needed sequencing. The pipeline would take awhile to process them. If you are doing this as a service for other people, turn around time is key. And so is your storage space. Most sequencing service providers retain your data for you, at least for awhile, and don't just toss it.

Personally, if I was offering this as a service I would look at building a small cluster. If possible you should get this located inside a machine room or data centre at your institution. They usually have services you can leverage for helping source and build it (although you may then be constrained on vendors) as well as maintaining it. Get good information on their policies first so you know what their restrictions are. I've had experiences when locating a server in a a Universities centre that they would only support it if the hard drives where in RAID 5 or RAID 6 using hardware RAID cards and they would only support Fedora for Linux OSes. Things like that can get really restrictive.

Of course depending on institutional policies you could always look at the route of going 100% cloud (or mostly cloud, maybe keeping your storage local). Especially if the machine won't be processing exomes 100% of the time AWS and getting to know some useful tools will let you construct virtual clusters when you need them, process the data, and then shut them down. The funding model is different, as you need money to pay monthly fees and the like versus an upfront capital expenditure (OpEx versus CapEx can be a challenge in science) but it is something worth considering. In the long run it can actually be cheaper and easier. I'd recommend reading some of the many slideshare presenetations from BioTeam, they do a lot of consulting for this sort of thing and have made many different presentations over the years about the state of computing tech for genomics, the cloud, etc. Some great recommendations.

If you're thinking on building a local cluster, and have a ballpark budget to fit within, I recently built a small cluster for supporting genomics in a clinical setting with targeted sequencing in oncology for about $50K Canadian and would be happy to share more of my experience and some vendors that had some really interesting things.

ADD COMMENT
0
Entering edit mode

Thanks so much for your detailed response! I'm afraid cloud/outsourcing is out for us due to institution restrictions. We'll be using a NextSeq500 and we currently have 10TB of storage on the server where the database will be stored. A the moment due to cost restrictions (I can't spend much more than £8000) I'm thinking I'll need something like the single dedicated workstation you described but potentially with a little more RAM. Within about 6 months I'm expecting to have about 24 samples every 8 weeks or so, but it will be much slower to start off with.

ADD REPLY
0
Entering edit mode

You're definitely going to want to allocate a large part of your budget towards storage. 10 TB will probably fill up quicker than you think, especially if your retaining any analysed data along with raw data, etc. And data should be stored redundantly, at least by using a RAID set up on your disks because disk failure will happen.

I went with data storage from 45 Drives (http://45drives.com), which is a Canadian company making high-capacity storage servers. Range of performance specs and prices, and they do custom work as well. Its is perfectly possible to build a server with them that will offer a ton of storage but also by high RAM and high CPU as well. Or you can just set it up as a fat NAS server for backup storage, again depending on your budget.

If you're going with the single fat server spec out 10-12GB per CPU core and go with the best price/performance ratio you can get on the CPUs. A dual CPU set up with the Xeon E5-2620's is a pretty good bet in terms of core count and clock speed to price.

ADD REPLY

Login before adding your answer.

Traffic: 1307 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6