We are planning to host a small web app targeted to the computational genomics domain using a cloud hosting solution. I am interested to know how different are the development scenarios in terms of hosting a web application on a cloud host comparing to a normal host or an on-site server. Looking forward to hear the insights from those who have implemented/used cloud based bioinformatics applications.
Have you tried asking this question on StackOverflow or even SuperUser? They're obviously not bioinformatics specialists, but they may have good ideas about the cloud part of your question... If you got good answers there, you could put a link to that question here!
@ ALL : Thanks for your input. Found this discussion at StackOverflow and the blog very useful. I will soon get back to all after setting up our server.
Have you looked at Google App Engine? I think a simple calculation of how much throughput and cpu you expect to use and how much it would cost for off-site dedicated hosting (100-200 buck a month for a basic application) versus amazon is the easiest way to answer your question. On site hosting open a whole plethora of security issues and maintainability nightmares for that server. Also, some foresight as to on-demand scalability would be nice and would have you leaning to EC2 if this is needed.
Or, if you're a Ruby Dev, go with Heroku. In general, lots of good options. It will boil down to how much computing you are going to do and how much scale out storage you might need in the future.
Thanks Deepak. Not a Ruby pro, still I can manage. I think Heroku is costly comparing to AWS or I am missing something ?
I want to host a genomics based app with Perl, R(several libraries), BLAST, HMMER, some custom developed programs, Plink and a Javascript library for plotting. AWS Reserved Instances small for 1 year with an Ubuntu based AMI will be a good start. Please let me know your thoughts. Thanks.
One of the emerging platforms for cloud hosting is DotCloud. I tried their service and its pretty impressive how you can pull in various development tools (MySQL, PHP, Python, Rails etc) that you may need to host any web application.
In short, I found it to simplify the deployment process (once you have it all figured and developed locally). Being a very simplistic host of few steps for deployment, it may attract new developers to try their service. They also claim to allow further customisation and control over your deployed apps.
In summary, I will call DotCloud a glue. And a very efficient host at it too. It's a little far-fetched but I think they're truly game-changers in aspect of providing cloud hosting over the Amazon EC2 network itself. However, at the time of writing they are still in beta nevertheless it may be the best time to give them a try.
Although I have not yet used it, I have heard many good things about Amazon EC2. There was a recent paper that used BOWTIE to call SNPs in Amazon EC2. I went to a seminar for developers hosted by Amazon and they mentioned that it is SAS 70 compliant support and can obviously support HIPAA. They run many different types of OS and seem to have good customer service.
Thanks, I am following the research papers related to cloud computing in bioinformatics. But I couldn't find a paper with technical details including preffered AMI for bioinformatics. AFAIK, this paper explains the use of AWS EC2 for SNP calling - this is not a hosted application. Please let me know if I am missing something.
One thing to keep an eye out is that the CPUs are all virtual and what one gets is computational power that corresponds to say 20 CPUs rather than getting a system with 20 distinct CPUs.
Thanks. I am going for a single CPU in the beginning and later to add more depending upon the usage. I am wondering if this addition of CPUs in later stage need significant changes in the codes ?
No changes are necessary for the code. It is just more of a benchmarking problem. It is more difficult to predict how a single CPU with the "computational power" of say 20 slower CPUS performs when running 20 jobs, compared to the 20 actual slower CPUS running these jobs.
Thanks. Do you have a manuscript based on the benchmarking. Have you compared different instances (on-demand instance, spot instance, reserved instances). Any comments on the Amazon Machine Image for bioinformatics (ubuntu base AMI and bioperl-max or any other bioinformatics based AMI).
Thanks Deepak. Your list of AMI's will be a good start for me. Eric's image looks interesting, I am not able to find the AMI based on a recent version of Ubuntu.
Have you tried asking this question on StackOverflow or even SuperUser? They're obviously not bioinformatics specialists, but they may have good ideas about the cloud part of your question... If you got good answers there, you could put a link to that question here!
@ ALL : Thanks for your input. Found this discussion at StackOverflow and the blog very useful. I will soon get back to all after setting up our server.
Thanks for your point Nicojo. I will check StackOverflow, SuperUser and ServerFault.