Question

Forum:Discussion: Uploading terabytes of data to NCBI SRA

0

Entering edit mode

21 months ago

James Reeve ▴ 130

Large genomic datasets are becoming increasingly common, and often we need to find a place to archive and share data once our project is done. Probably the most widely used archive is NCBI's Short Read Archive (SRA). However, if you ever tired to use this for large data sets, their ftp upload option is a pain. There are frequent drop-outs and you only have a few minutes from logging in to navigate to your directory and start to upload. In short, I find data archiving a very frustrating experience.

I want to get a discussion going about the tricks and tips for streamlining this process. Especially, I'd like to know how one connects a remote server to NCBI.

NCBI WGS archiving servers remote • 2.5k views

ADD COMMENT • link updated 21 months ago by GenoMax 151k • written 21 months ago by James Reeve ▴ 130

score 2 · Answer 1 · 2023-08-21

If you have a need to upload (tens or more) terabytes of data to NCBI then you need to directly reach out to SRA support and work out a solution.

Otherwise using Aspera connect for upload should be the solution that is preferred: https://www.ncbi.nlm.nih.gov/sra/docs/submitfiles/ Uploads in theory would only be limited by bandwidth your institutions allows you to use with Aspera (since I would imagine that NCBI has access to larger networking pipes than you probably do).

Looks like they also provide uploads from Amazon S3 buckets.

score 1 · Answer 2 · 2023-08-21

1

Entering edit mode

21 months ago

benformatics 4.1k

I've never had a problem with uploading to the SRA (both within and outside the US) for hundreds of GB of data - you might need to check with your internet provider if frequent drop-outs are a problem

ADD COMMENT • link 21 months ago by benformatics 4.1k

0

Entering edit mode

I'm based at a field station, so drop-outs are unfortunately unavoidable. I want to use the remote server since it's based in a major city.

ADD REPLY • link 21 months ago by James Reeve ▴ 130

0

Entering edit mode

If your data is at located at the remote server then there should be no issues with dropouts. Sounds like the problem may be with your link from the field station (is the data being generated there) to the remote server?

ADD REPLY • link 21 months ago by GenoMax 151k