I'm planning to get some new workstation for bioinfo work. It's going to be mostly NGS data processing.
I'm hesitating between AMD (Opteron or FX) and Intel (Xeon or i7). The former is much cheaper for 2x more cores, but much more power-hungry. For some bioinfo tools (ie. bowtie2) I noticed that 2 Opteron cores = 1 Xeon core at similar clock speed. So investing more money in Intel CPU makes sense, no?
What's your opinion?
I think that memory (and storage) is more important just now that cpu-cores. If you get a dual processor machine this will allow more memory and you can get programs like STAR running in minutes where tophat would take hours. As STAR needs about 35GB of continuous memory to store one genome your min spec would be about 64GB. Ideally you want much more. 1TB is not a luxury for denonvo transcriptome assembly
1TB for transcriptome assembly is scary given that SGA can assemble a human genome in 64GB. Even the more memory hungry allpaths can do that in 512GB. Transcriptome assembly is of a much smaller scale. Back to the question, Intel is indeed faster than AMD in general, but I have not seen 2X difference even between a recent 3.5GHz Xeon X5690 and a much older 2.3GHz Opteron 8376.
Human is only diploid though, I have seen at least one researcher using most of 1TB for a single run. Agreed on CPU performance differences. When memory is not an issue, most of our bottlenecks are IO, including internal bus and disk write.
I know transcriptome assembly poses new challenges, but the amont of data is nothing compared to whole genome assembly. Someone using 1TB RAM does not mean we should use 1TB. When buying hardware we should ask what is the requirement of a typical mainstream tool, rather than the requirement of a badly engineered in-house pipeline.
I do agree in the most part as the 1TB instances (actually is was ~0.8TB) are rare and extreme and pre-processing the data is much more sensible (if there is sufficient skill and knowledge of the system to do so). The point I am trying to make though is that moving the data within the computer is the slow point with our big data, rather than the speed of a particular core. i.e. Keeping data in memory is usually the fastest speed up.
These kind of questions always seems to me whether you are deciding to use one brand of calculator against another to sum two numbers together.
I wish it's that simplistic...
Unfortunately, Intel have pushed ahead of AMD in terms of performance. It isn't a matter of brand, but which system can provide you with the best computational performance for your money.