Hi Biostars,
At our small lab we only have a couple of heavy machines to run blast etc. So we were looking into amazon ec2 with the cloudbiolinux AMI : https://aws.amazon.com/amis/cloudbiolinux-ubuntu-13-04-2013-08-28.
I wanted to blastx some sequences versus uniref50 on the cloud. By using small files and check how fast it was going we made an estimate of how fast it would be on the cloud and if it would be worth our money.
Our computational machine is a : 2 X INTEL QC E5506 2.13GHZ 12GB RAM 1 X 1TB SATA HARD DRIVE.
Amazon machine : 3.8xlarge: 60 GiB of memory, 32 vCPUs, 108 EC2 Compute Units, 640 GB of SSD-based local instance storage, 64-bit platform
We compared the performance on the same dataset and DB:
Our local machine : 1500 sequences in 120 min Blosum 62 8 threads. Amazon with cloudbiolinux ami: 1500 sequences in 60 min Blosum62 and 100 threads on the 3.8xlarge
**Local**: 750seqs:60min
**Amazon**: 1500seq:60min
We also compared an 'empty VM versus the cloud biolinux':
1500 sequences pam32 and 100 threads in 20 min Amazon ubuntu AMI: sudo apt-get install ncbi-blast+ & downloaded and create the uniref50db 1500 sequences pam32 and 100 threads in 10 min
cloudbiolinux: 1500seqs:20min
amazonUbuntu: 1500seqs:10min
We were a not impressed with the performance of ec/2.... Has anybody had similar experiences with using ec2 services?
Maybe some of you guys might give us some pointers on where we went wrong.?
So basically the reading of the uniref database to and from memory is the bottleneck... Wow did not expect that. Thanks for the insight. How would you explain the difference between an empty installation and the usage of the cloudbiolinux ami?
For IO intensive jobs, the best approach is to use the local machine (ephemeral) storage: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html. In this case it's an SSD, so should be quite fast. You'll then need to copy final results you want to save back to EBS or S3 since local storage goes away once the machine terminates. In terms of CloudBioLinux versus a fresh AMI, it could be the version of blast+ available. We badly need to update the Amazon AMI with the latest version. If they are the same it's likely fluctuation in EBS read/write throughput, which can vary.
Did not know that. Some other interesting tidbits on that page:
Thanks Brad, i just had the same realization to use the internal storage and rechecked that and it was EBS....
The blast version in 12.04(the blank install) is 2.2.25+ from jan 3 2012