Let's say you're a core facility serving customers both internal and external to your institution. You want to have an efficient way to distribute NGS data such as BAM files, FASTQ, and custom QC reports to your customers. You're open to pretty much any method as long as it is reliable, cost-effective, and convenient for both you and your customers. How would you accomplish this?
In principle, I like this idea too.
However, there are two main drawbacks: 1) confidentiality. People often don't like data leaving the institution. 2) doing analysis in the cloud is hardly routine at the moment, so you'd be wasting time and money putting the data in the cloud only for it be downloaded to local storage again.
1) True, see USA Patriot Act : data encryption should be considered. 2) The question is not about the computation but sharing data.
Indeed, the original question was about sharing, but I replied to the answer which suggested EC2 storage was a good solution because of computation (amongst other reasons). I was refering to that, not the original question.
I appreciate the thoughts. You are absolutely correct that people don't like data to leave the institution. S3 is FISMA "moderate" certified, which good enough for protected health information. Public key encryption would secure data transmission, and some kind of passkey encryption should be used for securing access to the files.
With respect to the comment about "putting data in the cloud": S3 is a content distribution network plain an simple. You are likely using it every day without realizing it (especially if you use Dropbox). EC2, on the other hand, is a platform for "cloud" computing. It would definitely be wasteful to store your genomic data on an EC2 server for distribution, just as it is wasteful to transfer files "in" to the cloud for analysis. Placing the data on S3 eliminate both issues by providing fast and reliable distribution to the wider internet, as well as immediate transfer within all of Amazon's web services.
Ah, ok. I may have misunderstood what S3 is. Thanks for the clarification.