Hello Everyone,
We are planning to build our very own server for genomic analysis in our university. We currently are renting from another university's facility for our computing and storage needs where we just upload our data run our pipeline and download the output.
We use 200TB of raw data storage (will require 300TB for the new one we would be building by ourselves, also planning to keep a replica file storage as back up - an additional 300 TB). We have a data pipeline that we run in an environment of 50 TB space with the sample input data being 0.5 to 1TB. The software and tools we use are - Annovar, BWA(Burrows-Wheeler Aligner), GATK, R, Picard, SPiCE, Samtools.
Generally our pipeline takes a day, sometimes 2 to complete. From my understanding we need 600 TB in total including the replica for archival storage. What would be the decent RAM required to push and pull data (a couple of TBs for processing) from this storage?
What are the server configurations that would be ideal for this? What are the questions that need to be answered for me to start building this server? What would be the problems i need to address? Any pointers, checklists or a question to start with is much appreciated?
Thank you in Advance!
Thank you so much for your response. FYI: I made an error when mentioning about the storage details for our pipeline. But your reply still holds true. Thank you once again. I will start weighing the crtical nature of our data against our budget.