Has anyone used the 1000 genomes public data set available on Amazon s3?
Or, I should ask- has anyone used the BAM files directly via an AWS service such as elastic mapreduce?
I can download the files to EBS, unpack them, and reupload them to s3 but that is more expensive (and more work) than the public/free copy.
Thank you for any insight, Justin
Edit: I am currently looking into hadoop-bam http://sourceforge.net/projects/hadoop-bam/
Yes. It says to access the data the generic way you access s3 data. Next it describes how to start up an ec2 image from their AMI and the remaining is a tutorial that is not specific to AWS. I guess the title of my question is misleading- my real problem is trying to use bam files in elastic mapreduce, which doesnt know how to split records in a bam file.
oh I got it, please edit your post, maybe someone here knows the answer.