Dear community members,
I've recently was involved into a project regarding starting medical genetics facility in one of the developing countries. Have to admit - I am more of a "classical" bioinformatician in a Western country and I've never experienced shortage of computational power nor have I ever set up a server myself.
What would you aim at, in terms of CPU/HDD (solid or magnetic?)/RAM? The volume: currently 100 WES, 1-2 exomes per week. No sequencing will be performed on site, so no demultiplexing needed. However, alignment is already a must. Short variant calling as well as CNVs + storage of databases and annotation is obviously the most typical tasks for this future server.
Thank you for any hints.
Some unrelated advice.
To be safe apply the same security/privacy considerations your are used to using from the beginning for everything. This will save you heartache down the road.
If possible find a proper systems admin (whose day job is such) to manage the server/storage/security.
Yes, it seems to be a very hard job and I am not sure if there is a single systems admin in this country who would agree to participate in the projects with the funds we [actually, the local team, I am an external advisor] can offer to them. Thanks a lot for this advice - I have totally forgotten about the security issues.
If you can get the local team to split and store any identifying information somewhere other than the server holding the sequence data that will add a layer of protection.
Kudos for volunteering your time and expertise.
I'm not an expert on server deployments but, whatever configuration you settle on, make sure you give a lot of thought and attention to your data storage. Things like CPU and RAM can be swapped about or replaced, but if you lose data, you are toast. It might not be unreasonable to consider the data storage and backup solution to be of greater importance than the compute capacity, to some extent. The last thing you need is a flood or hurricane or fire to destroy all your storage volumes and lose all your raw data. Also as mentioned, evaluate cloud resources like AWS for feasibility.