Question

How to download NCBI fastq data to local machine where access type is "Use Cloud Data Delivery"

0

Entering edit mode

3.7 years ago

daewowo ▴ 80

I have setup AWS and Google cloud accounts with billing. But both are overwhelmingly complicated for a simple user. What I would like to do is download multiple fastq files from NCBI that have gs:// or s3:// locations, to my local machine. I realise there are instructions but I cant find a simple explanation of what I need to do.

If we take AWS. I have aws cli installed. When I try:

aws cp s3://sra-pib-src-7/thesra/thefastq

I get:

fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden

What exactly do I need to do to sign up for with AWS and setup on AWS to allow this data download?

Do I need to create an EC2 instance, log into console, s3 copy to the EC2 instance, then download from my home machine via s3 copy - or is there a way a bypass the process of copying to an EC2 instance and direct download to home machone from the s3 bucket location on NCBI?

delivery NCBI cloud • 4.1k views

ADD COMMENT • link updated 3.7 years ago by GenoMax 150k • written 3.7 years ago by daewowo ▴ 80

0

Entering edit mode

to get data from NCBI use fasterq-dump from the SRA-toolkit, or check if the data already is sync in ENA

ADD REPLY • link 3.7 years ago by JC 13k

0

Entering edit mode

I have used SRA download and fasterq-dump before but I want to download the original fastq files as I want to obtain the original machine read header information. SRA download doesnt keep the original machine id in the header.

I just checked and ENA files also have the original machine id/flow cell etc details overwritten by the SRA number and an index. I need to obtain the original fastq

ADD REPLY • link 3.7 years ago by daewowo ▴ 80

1

Entering edit mode

Can you post an example SRA# you are trying to download?

When data is set to "requestor pays" having an account on AWS/GCP would be needed (which you said you have). You should be able to directly copy data to your EC2 instance.

ADD REPLY • link 3.7 years ago by GenoMax 150k

0

Entering edit mode

https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR10620024

Sounds like then, that I would need to create an EC2 instance and download to that, then pull from that EC2 to my home machine, rather than directtly downloading to my home machine using aws s3 cp?

ADD REPLY • link 3.7 years ago by daewowo ▴ 80

score 3 · Accepted Answer · 2021-07-14

3

Entering edit mode

3.7 years ago

GenoMax 150k

If you want to download the data to your local computer then you could do so using aws/GCP command line utils. You could also directly copy the data to your EC2 instance, if you are planning to analyze it there.

With google you are able to copy the data by doing (use real values for your project etc)

gsutil -u your_project cp gs://sra-pub-src-9/SRR10620024/BDR5-2_S3_L007_I1_001.fastq.gz.1 copy_location

Something similar should be possible with AWS.

ADD COMMENT • link 3.7 years ago by GenoMax 150k

0

Entering edit mode

See response from GenoMax which worked, thanks!

ADD REPLY • link 3.7 years ago by daewowo ▴ 80