I generally get files from BaseSpace using the 'bs' shell utility, with the following command:
bs cp conf://~default/Run/{run_id} {output_dir}/{run_id}
This creates a folder named {run_id}
inside my {output_dir}
, which then I transform into a .tar.gz file using
tar -czf {output_dir}/{run_id}.tar.gz {output_dir}/{run_id}
And then upload to my cloud service (AWS/GCP) using their default Shell utilities.
I've been looking for ways to copy the BCL files directly, either already doing the gzip along the copying process, or even somehow copying it directly from BaseSpace to Google Cloud Storage/Amazon S3. Is there any way to do that, copying files directly, or at least gzipping them as I copy them from BaseSpace, so I won't have two copies of the same file in my machine at one point? I didn't want to use a large machine at AWS/GCP, with high storage, just for the transfer, and would prefer to be able to copy directly into the 'Storage Service', though copying the file into the machine already zipped would also be preferable to copying the raw folder and files and having to zip them, thus consuming more than twice the storage size.
Also, if anyone has experience with Cloud Services, is there any specific GCP service you'd recommend for this copying process? I've been thinking of using CloudRun, so we'd have an established/automatic pipeline with that.
You could provision a small VM. Install basespace utils on it and then copy to your storage bucket directly after logging into that VM?
You could use other utils like: https://github.com/BFSSI-Bioinformatics-Lab/BaseMountRetrieve
I'm looking to automate the process, so we don't have to actually log into the VM and use a script for that, but rather have a machine that I can send a post to with the 'run_id' and have it do all of that process by itself. We have at least one run every week, so having this manual step automated would help a lot. (it's already automated today, but running in a VM that is active 24/7, and I'd like to make it serverless so we don't have to keep a VM up all the time)
yeah in the past I have just used
basemount
to keep the BaseSpace location mounted to the server I am on, then you can implement your automatic methods however you want. https://basemount.basespace.illumina.com/If you already have an AWS instance then maybe you can just mount it there? Not sure