Hardware Needed To Analyse Microarray Data With R/Bioconductor
6
7
Entering edit mode
13.8 years ago
John ▴ 70

Hi

My lab has an ~80 chip affy Exon study planned for the near future which I'm planning to analyse using R/Bioconductor. This will be a first for the lab so we have no dedicated computing hardware other than the usual desktop PCs etc. but do have some budget to get in a dedicated machine.

So I'm trying to get advice as to what spec of hardware we'll need to analyse this effectively without driving a coach and horses through the budget. Thoughts so far range from a 6 core (i7 975), 64 bit OS, with at least 8GB Ram if not a lot more with a sizeable price tag to match down to a quad core machine with 8GB at a tenth of the price.

Does anybody have any insights into what level of processing power we'll really need to do this analysis well?

Many thanks for your help.

John

hardware microarray analysis r bioconductor • 8.5k views
ADD COMMENT
1
Entering edit mode

I wouldn't say that R isn't multithreaded. Maybe the language itself isn't, but there are many packages that allow one to take advantage of multiple cores.

ADD REPLY
0
Entering edit mode

I did 27 exon array chips on a 6GB i7 920 computer, including advanced analysis and just about everything you can think of. I don't see a problem scaling to 80 - at worst, you would need a bit more ram.

ADD REPLY
0
Entering edit mode

Thank you all for your input and ideas. I'll definitely being going for more RAM plus the capability to add more later, multiple cores and plenty of backed-up storage. I just need to decide on processor make e.g. i7 vs Xeon vs AMD vs other? Has anybody tried different configurations for the same type of job (i.e. having moved institution) and come up with a preferences; or as important processor makes & types to avoid when using R/Bioconductor?

ADD REPLY
0
Entering edit mode

If it's just for affy, the processor doesn't really matter, R is not multithreaded, so just get more RAM. Further, you should think about what to do with the machine after that. The affy processing is sort of a one-shot analysis with some follow ups. I would try getting something, that can be used as a computational server afterwards. Then, a multi CPU machine might pay off given multiple users and other software to run.

ADD REPLY
6
Entering edit mode
13.8 years ago

Many analysis procedures are memory hungry, 8GB of RAM is inadequate. Go for 64GB or 96GB.

The speed of individual CPU cores is less important than the number of them. Thus it is far better to buy a configuration with more and somewhat slower processors than fewer but the fastest ones.

ADD COMMENT
3
Entering edit mode

Last time I checked the affy package methods were only using one core on my machine. If you say they can handle more, I probably should fix my configuration ;)

ADD REPLY
0
Entering edit mode

Often one can run a separate R processes in parallel one for each file. Plus today affy may only work on one core, who knows what the future holds. The hardware needs to serve over long term and over many possible usage scenarios.

ADD REPLY
6
Entering edit mode
13.8 years ago

I've done analysis of Affy exon arrays (using XPS). You do not need a monster computer to do this analysis. What you do need is a big pile of RAM and adequate hard drive space. For example, you could buy an ordinary box (e.g. a Dell Precision workstation). Buy 16 gigs of ram from NewEgg, put in two 2-terrabyte hard drives, and load a 64-bit linux such as Ubuntu. This machine would also be perfectly adequate for many ordinary sorts of microarray analysis.

If you find yourselves doing a lot more complex bioinformatics analysis you will need multiple machines, but at that point the best bet is to use someone else's hardware-- either a local compute cluster or cloud machines. However, it sounds like you're not there yet, and there's no reason to spend $10,000+ on a machine at this point.

ADD COMMENT
2
Entering edit mode

Just to clarify, you will want to really avoid running out of RAM and then saving parts of the intermediate results on the hard disk (swapping). It might slow down things by a factor of 100 or more.

ADD REPLY
1
Entering edit mode

Agreed. 16G is adequate for ~ 80 exon arrays using Bioconductor. In fact, you might scrape by using 8G but it would be slow, with swapping. Basically, get as much RAM as you can afford.

ADD REPLY
0
Entering edit mode

Thank you David, neilfws & Michael. Am I right that the XPS runs with intel or AMD processors, while the Dell Precision workstations have Xeon? Are there any advantages or disadvantages to using one of these types of processors over the others? There certainly seems to be quite a price differential!

ADD REPLY
5
Entering edit mode
13.8 years ago

Going price effective

I would recommend a 64GB RAM, a 16 cores, and 8TB of storage.

Going along with what Istvan suggested, you may what to look at something like this:

RAM will be the major part of the cost. You can get the price down to $3,600 in you choose 16GB non-ECC RAM.

Putting it into the server-room

This is a rack mount server. In a server room modern computational Biology hardware will have adequate cooling and scalability. Also server rooms usually have uninterpretable power and more reliable networks. Rack mounts for this type of servers are cheaper if you don't have to pay for the server room (but please correct me if I'm wrong on this one). Finally, a server room will keep noise out of your workspace.

Considerations

Perhaps you could find a server room on your campus that will let you collocate the server with them. You would use the Linux server through SSH with X11 forwarding if you need graphics. For some incidents you will need physical access to the server with a keyboard and monitor available.

Things to use for your advantage

  1. Server room cost are covered by the university
  2. The hardware for one server takes almost no time to maintain;
  3. A systems admin will be needed in any case (both for personal server and for Amazon).

Another reason why not to use the cloud for Bioinformatics

alt text

http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/

ADD COMMENT
1
Entering edit mode

@John, I found that AMDs have a higher performance to price ratio for the current generation for servers. For desktops, I doubt that it's different. For servers a 12-core AMD Opteron chip cost about the same as a 6-core Intel Xeon chip. Having twice the number of cores will roughly translate to having your computation finish in half the number of days. (Not having enough RAM would multiply the number of days by 100 or more).

ADD REPLY
0
Entering edit mode

Thanks Aleksandr. Would you still use AMD cores in a desktop type set-up rather than a server? or are there disadvantages to using these?

ADD REPLY
4
Entering edit mode
13.8 years ago
Gareth Palidwor ★ 1.6k

If you're money constrained, you might consider using an Amazon Elastic Compute Cloud (EC2) instance to do the analysis. You can just switch to an instance with more memory/processors as required.

ADD COMMENT
2
Entering edit mode

I do not recommend Amazon Cloud because it's too overpriced. At SCALEx9, I got together with the creator of Amazon's monthly price calculator http://calculator.s3.amazonaws.com/calc5.html - we estimated $21k for an EC2 server (3 year reserved 64GB RAM at 24*7). You can save on monthly fees by not using the server. You still have to pay $6,590 up front. Here is the interesting part, instead of using Amazon you can buy a more powerful machine for under $6k with 3 years of parts replacements.

ADD REPLY
1
Entering edit mode

Yep and means you don't have to worry about who is going to administer the machines when your technically competent PhD student leaves and no-one has any clue how to update or fix the machine in question.. This is an excellent suggestion.

ADD REPLY
1
Entering edit mode

@Daniel, being a system administrator for more than 3 years, I can say that 5% of administration is hardware related, 85% is done through SSH and 10% through email. So administrating EC2 would be not be much different. You may save some time of OS installation because EC2 provides pre-built templates but you still need to do the OS updates on your own.

ADD REPLY
1
Entering edit mode

@Daniel, I think it is a good idea. But only if that it's price effective for you. It's hard to tell because there are many pricing variables. The 64GB RAM instance is $1.60 per/hour and up (there are several other charges). That's $1,168/month and up if the PhD student forgets to shut down the VM instance.

ADD REPLY
0
Entering edit mode

@Daniel, I do think it is a good idea given that it's price effective. The 64GB RAM instance is $1.60 per hour. That's a minimum of $1,168 per month if the PhD student forgets to shut it down.

ADD REPLY
0
Entering edit mode

@Daniel, I think it is a good idea. But only if that it's price effective for you. It's really hard to tell. The 64GB RAM instance is $1.60 per hour. That's $1,168/month (minimum!) if the PhD student forgets to shut it down.

ADD REPLY
0
Entering edit mode

@Daniel, I think it is a good idea. But only if that it's price effective for you. It's hard to tell because there are many pricing variables. The 64GB RAM instance is $1.60 per/hour and up (there are several other charges). That's $1,168/month and up if the PhD student forgets to shut it down.

ADD REPLY
0
Entering edit mode

1.) Server room cost are covered by the university; 2.) The hardware for one server takes almost no time to maintain; 3.) A systems admin will be needed in any case (both for personal server and for Amazon).

I suggest a $5,689 personal server (see my answer). I got together with the creator of Amazon's monthly price calculator at SCALEx9 and we estimated $21k (6590 + 409123) for similar EC2 server (3 year reserved 64GB RAM at 100%).

That's if you run the instance 24*7. If you don't run at all for 3 years - then you end-up paying $6,590.

http://calculator.s3.amazonaws.com/calc5.html

ADD REPLY
0
Entering edit mode

I do not recommend Amazon Cloud because it's too overpriced. I got together with the creator of Amazon's monthly price calculator at SCALEx9 and we estimated $21k (6590 + 409123) for similar EC2 server (3 year reserved 64GB RAM at 100%). That's if you run the instance 24*7. If you don't run at all for 3 years - then you end-up paying $6,590. calculator.s3.amazonaws.com/calc5.html

ADD REPLY
0
Entering edit mode

Thanks Aleksandr. Quite informative.

ADD REPLY
2
Entering edit mode
13.8 years ago

I did 27 exon array chips on a 6GB i7 920 computer, including advanced analysis and just about everything you can think of. I don't see a problem scaling to 80 - at worst, you would need a bit more ram.

ADD COMMENT
0
Entering edit mode

I haven't used affymetrix Exon chips, but I regularly run analyses of several hundred microarrays on my good old regular portable PC, which is currently prived at 900$CAN. You don't need a super computer to run R/Bioconductor.

ADD REPLY
0
Entering edit mode
13.2 years ago
W Langdon ▴ 30

The standard use of R and bioconductor assumes you will simply stack up all the microarray data hence the need for vast memeory. However this need not be neccesay. Using R we analyse thousands of Affymetrix genechips using a PC with 4Gbytes by making two passes through the data.

W. B. Langdon, G. J. G. Upton, R. da Silva Camargo, and A. P. Harrison. A survey of spatial defects in Homo Sapiens Affymetrix GeneChips. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(4):647-653

R code can be found at http://bioinformatics.essex.ac.uk/users/wlangdon/TCBB-2007-11-0161.tar.gz

Bill

ADD COMMENT

Login before adding your answer.

Traffic: 1986 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6