Entering edit mode
5.3 years ago
noodle
▴
590
Let's say I download a pacbio run file from the NCBI SRA dataset shown here; https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR9849809
download link here; https://sra-download.ncbi.nlm.nih.gov/traces/sra2/SRR/009618/SRR9849809
Can someone tell me the format of the downloaded SRR9849809 file? It doesn't seem to be a standard compressed format, unless I missed something.
Thanks!
See
Tutorial: How to download raw sequence data from GEO/SRA
How to download raw sequence data from GEO/SRA . Although in this case you don't need to split files.Or directly download the FASTQ (sequences) from ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR984/009/SRR9849809/SRR9849809_subreads.fastq.gz
Thanks you!
A related but different question - I noticed there are s3 and gs bucket listings. Do you know if SRA has a public bucket? Or is there a way to request access? Thanks again :)
s3://sra-pub-run-4/SRR9849809/SRR9849809.1
gs://sra-pub-run-4/SRR9849809/SRR9849809.1
You can install
gcloud
utilities (part of Cloud SDK) on your server. You can then copy data directly from google bucketgsutil cp gs://sra-pub-run-4/SRR9849809/SRR9849809.1 your_local_disk
Update: Even though the google storage bucket is public, it appears that you have to pay to download the data (
Bucket is requester pays bucket
).AWS command line utility provides similar functionality for Amazon buckets.
Thanks, unfortunately it seems like these are not public buckets.
I think buckets are public. For data egress you need to pay. So you will have to provide a valid google compute/cloud project name for billing.
Any idea where I can initiate this? It's not so obvious clicking around the NCBI/SRA website...I'll start a new thread
Initiate? You need a valid google compute account (which can be set up using directions here). Generally you would have access via your institution (since they will pay for your account). Unless you intend to use google compute for analysis you may be best off getting the data via ENA link provided by @Jean above.
yes, of course. I regularly use AWS and have the gs utils installed as well. It seems we're right at the transition period. https://www.nlm.nih.gov/news/NLM_Moves_SRA_Cloud.html
As long as NCBI keeps free access available via
sratoolkit
(and ENA keeps fastq files available, which I believe they have committed to doing) all should be well. Not everyone would be able to have google/AWS accounts and pay for data downloads.Sure, fasterq-dump is great.
I started a new thread here if you want to follow.
NCBI SRA AWS AMI
joe : NCBI SRA support indicated that the cloud services are not ready for public use (August 2019). Some data downloads will requirement payment, some not. Public announcement about cloud services will be coming in near future.
I have no idea. I know there is a public ftp.