I have setup AWS and Google cloud accounts with billing. But both are overwhelmingly complicated for a simple user. What I would like to do is download multiple fastq files from NCBI that have gs:// or s3:// locations, to my local machine. I realise there are instructions but I cant find a simple explanation of what I need to do.
If we take AWS. I have aws cli installed. When I try:
aws cp s3://sra-pib-src-7/thesra/thefastq
I get:
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
What exactly do I need to do to sign up for with AWS and setup on AWS to allow this data download?
Do I need to create an EC2 instance, log into console, s3 copy to the EC2 instance, then download from my home machine via s3 copy - or is there a way a bypass the process of copying to an EC2 instance and direct download to home machone from the s3 bucket location on NCBI?
to get data from NCBI use
fasterq-dump
from the SRA-toolkit, or check if the data already is sync in ENAI have used SRA download and fasterq-dump before but I want to download the original fastq files as I want to obtain the original machine read header information. SRA download doesnt keep the original machine id in the header.
I just checked and ENA files also have the original machine id/flow cell etc details overwritten by the SRA number and an index. I need to obtain the original fastq
Can you post an example SRA# you are trying to download?
When data is set to "requestor pays" having an account on AWS/GCP would be needed (which you said you have). You should be able to directly copy data to your EC2 instance.
https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR10620024
Sounds like then, that I would need to create an EC2 instance and download to that, then pull from that EC2 to my home machine, rather than directtly downloading to my home machine using aws s3 cp?