My work focus on analysis of genomes (including human) to discover new motifs and clustering them. The clustering step alone takes a huge amount processing power. I estimate that clustering step, alone, will take approximately 2 months using my current computer (all 4 cores loaded, i5 4670K@4.1Ghz with 8 GB Ram).
Because it is such a long time to lock my computer down I couldnt actually complete it yet. I will also need more RAM but I can not estimate the peak memory requirement until I actually complete it.
Fortunately I recieved approx. 4K dollars of funding for building a workstation. And that is why I need some help.
I rather go with faster and multiple core cpus and the new AMD chip 1950X is within an acceptible price range. Also supports 8 quad-channel Ram sticks with up to 3600 MHz speed.
However, I hear so much about ECC (Error Correcting Code) Ram and how it is essential for workstations. On the other hand, unfortunately, ECC Rams are more expensive and slower than non-ECC Ram. There are UDIMM ECCs and LRDIMM ECCs both with only up to 2133 MHz speed and only dual-channel capability (at least according to what I read). 1950X will support only UDIMM ECCs according to the manufacturer and max ECC UDIMM I can get my hands on is 16 GB versions.
If I insist to go with ECC, then (1) either I have to get ECC UDIMMs at 2133 MHz speed with dual channel capacity and use it with 1950X.
(2) Or I can decide to get a server CPU such as Intel E5-2630v4 or AMD EPYC 7301, which are in my price range. However, according to the product details the aggregate CPU frequency (cpu frequency x core count) of these cpus will be significantly lower than 1950X.
(E5-2630v4= 20coresx3.1Ghz vs Epyc 7301= 32x2.7GHz vs 1950X= 32x4GHz)
On the other hand, a dual-socket server motherboard will have the potential for upgrade with a second CPU and additional Ram sticks in the future.
(3) If I choose to let go of ECC, then I can get Ram sticks with more capacity and higher speed.
So my questions are,
Is ECC so crucial for type of bioinformatics I do? If so which route should I take (1), or (2)...or any other suggestions?
I doubt RAM speed will be a significant bottleneck. Even in gaming workloads etc RAM speed makes extremely minimal differences. I'd prioritise capacity and probably ECC over speed. Basically all server memory is ECC, for good reason.
Before you decide on your CPU choice as well, assess how well multithreaded your process is. If there are parts where youre reduced to a single core, then clock speed will be a higher priority and aggregate cycles may not count for much. If its very well multithreaded, then sharing the load will definitely speed the job up.