Setting up our own server for Genomic Analysis
1
0
Entering edit mode
10 months ago
Maverick ▴ 10

Hello Everyone,

We are planning to build our very own server for genomic analysis in our university. We currently are renting from another university's facility for our computing and storage needs where we just upload our data run our pipeline and download the output.

We use 200TB of raw data storage (will require 300TB for the new one we would be building by ourselves, also planning to keep a replica file storage as back up - an additional 300 TB). We have a data pipeline that we run in an environment of 50 TB space with the sample input data being 0.5 to 1TB. The software and tools we use are - Annovar, BWA(Burrows-Wheeler Aligner), GATK, R, Picard, SPiCE, Samtools.

Generally our pipeline takes a day, sometimes 2 to complete. From my understanding we need 600 TB in total including the replica for archival storage. What would be the decent RAM required to push and pull data (a couple of TBs for processing) from this storage?

What are the server configurations that would be ideal for this? What are the questions that need to be answered for me to start building this server? What would be the problems i need to address? Any pointers, checklists or a question to start with is much appreciated?

Thank you in Advance!

server hardware sequencing • 430 views
ADD COMMENT
2
Entering edit mode
10 months ago
GenoMax 147k

There are two different questions here.

There is compute resources and the storage. When you are looking at over 0.5 PB of storage you need to be careful about planning for a robust system. While cost is always a prime concern going cheap can bite you big time in future (if the data is critical). Setting this up in house is going to steal time from a productive scientist unless you are going to have access to a systems administrator.

If you are already doing this on existing hardware then you minimally are going to need whatever you are using now so start there and plan for future additions.

ADD COMMENT
0
Entering edit mode

Thank you so much for your response. FYI: I made an error when mentioning about the storage details for our pipeline. But your reply still holds true. Thank you once again. I will start weighing the crtical nature of our data against our budget.

ADD REPLY

Login before adding your answer.

Traffic: 1828 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6