I am trying to setup EC2 Amanzon AWS Rstudio and Linux AMI for scRNAseq analysis, and I am curios to see what sort of set up are sufficient and cost-effective for the bioinformatic use.
For Rstuio, I tried the Rstudio AMI (developed by Louis Aslett )and mainly run Seurat (integration), SingleR(LTLA version), Slingshot and other pseudotimers. Also I run 10X Cell Ranger, Velocyte, Scanpy, and other python packages on E2 Linux AMI. All the data are stored in my S3 and I move my data from S3 to EBS and move back after the analysis is done.
I tried an instance with md5x12large which comes with 48 vCPU, 192Gib, and 2X900GB, and I think I was getting killed by the storage cost.
Anyway, I'd really appreciate if anybody uses E2 AWS could comment on the ideal E2 setup for bioinformatic analysis.
I would try to figure out the needs for each of your applications, since they are very different. If you are concerned about costs, you should optimize each step.
Also, if cost is a concern, maybe Amazon is not the best option. The advantage of Amazon is the ability to quickly increase or decrease your computational capacity. You are paying a premium for that.