Question

Medical genetics server for a developing country

1

Entering edit mode

13 months ago

German.M.Demidov ★ 2.9k

Dear community members,

I've recently was involved into a project regarding starting medical genetics facility in one of the developing countries. Have to admit - I am more of a "classical" bioinformatician in a Western country and I've never experienced shortage of computational power nor have I ever set up a server myself.

What would you aim at, in terms of CPU/HDD (solid or magnetic?)/RAM? The volume: currently 100 WES, 1-2 exomes per week. No sequencing will be performed on site, so no demultiplexing needed. However, alignment is already a must. Short variant calling as well as CNVs + storage of databases and annotation is obviously the most typical tasks for this future server.

Thank you for any hints.

server • 924 views

ADD COMMENT • link updated 13 months ago by steve ★ 3.5k • written 13 months ago by German.M.Demidov ★ 2.9k

2

Entering edit mode

Some unrelated advice.

To be safe apply the same security/privacy considerations your are used to using from the beginning for everything. This will save you heartache down the road.

If possible find a proper systems admin (whose day job is such) to manage the server/storage/security.

ADD REPLY • link 13 months ago by GenoMax 147k

0

Entering edit mode

Yes, it seems to be a very hard job and I am not sure if there is a single systems admin in this country who would agree to participate in the projects with the funds we [actually, the local team, I am an external advisor] can offer to them. Thanks a lot for this advice - I have totally forgotten about the security issues.

ADD REPLY • link 13 months ago by German.M.Demidov ★ 2.9k

1

Entering edit mode

If you can get the local team to split and store any identifying information somewhere other than the server holding the sequence data that will add a layer of protection.

Kudos for volunteering your time and expertise.

ADD REPLY • link 13 months ago by GenoMax 147k

2

Entering edit mode

I'm not an expert on server deployments but, whatever configuration you settle on, make sure you give a lot of thought and attention to your data storage. Things like CPU and RAM can be swapped about or replaced, but if you lose data, you are toast. It might not be unreasonable to consider the data storage and backup solution to be of greater importance than the compute capacity, to some extent. The last thing you need is a flood or hurricane or fire to destroy all your storage volumes and lose all your raw data. Also as mentioned, evaluate cloud resources like AWS for feasibility.

ADD REPLY • link 13 months ago by steve ★ 3.5k

score 3 · Answer 1 · 2023-10-26

When setting up servers look at the ansible ecosystem. If you set up 1 server a second is typically not far behind, so use templating systems like ansible.

The Galaxy Admins on github have lots of example ansible scripts for setting up your own servers. Start simply, copy and paste etc. It's really worth it, especially if you have to expand into the cloud later.

https://github.com/ARTbio/GalaxyKickStart

I'd start with nf-core pipelines for analysis.

The most important thing might be trying to get a reliable power supply and cooling though.

score 2 · Answer 2 · 2023-10-26

You can pretty much always (re)analyze data getting CPU time on some cluster/AWS etc. But the crucial thing is to have a solid data storage and a backup so you have the raw(ish) data intact. So you may start with a RAID/ZFS storage and a UPS. Check your connection speed and see if i.e AWS Glacier is a viable option for a long term data storage.

Lower end Epyc boxes (single 24cores, 64G RAM) should be good enough to start.

Just even with a single NGS data processing server go for Slurm and Nextflow/Snakemake from the month one. This will save you a lot of CPU and your brain time.