Question

downloading raw scRNA seq data from NCBI

0

Entering edit mode

24 months ago

bioinformatics ▴ 40

Hi,

I'm trying to download raw fastq files that were generated by scRNAseq from SRA in NCBI.

I'm using terminal on a Mac

I have run the following commands:

admins-MacBook-Air:~ mesalie$ cd desktop
admins-MacBook-Air:desktop mesalie$ mkdir tmp
mkdir: tmp: File exists
admins-MacBook-Air:desktop mesalie$ tar -xvzf sratoolkit.3.0.2-mac64.tar
admins-MacBook-Air:desktop mesalie$ ls
admins-MacBook-Air:desktop mesalie$ cd sratoolkit.3.0.2-mac64
admins-MacBook-Air:sratoolkit.3.0.2-mac64 mesalie$ ls
admins-MacBook-Air:sratoolkit.3.0.2-mac64 mesalie$ cd bin/
admins-MacBook-Air:bin mesalie$ ./vdb-config - I

However for the last command I get the following error message:

dyld: lazy symbol binding failed: Symbol not found: ____chkstk_darwin
  Referenced from: /Users/mesalie/Desktop/sratoolkit.3.0.2-mac64/bin/./vdb-config (which was built for Mac OS X 10.15)
  Expected in: /usr/lib/libSystem.B.dylib

dyld: Symbol not found: ____chkstk_darwin
  Referenced from: /Users/mesalie/Desktop/sratoolkit.3.0.2-mac64/bin/./vdb-config (which was built for Mac OS X 10.15)
  Expected in: /usr/lib/libSystem.B.dylib

Does anyone know how I might correct this?

Thanks!

SRA terminal • 2.6k views

ADD COMMENT • link updated 24 months ago by ATpoint 86k • written 24 months ago by bioinformatics ▴ 40

0

Entering edit mode

It's probably an Intel binary and your Macbook is M1 (ARM) based? Give it a try installing via conda (https://anaconda.org/bioconda/sra-tools) or use sra-explorer.info to get direct download links for your datasets of interest.

ADD REPLY • link 24 months ago by ATpoint 86k

0

Entering edit mode

Ok I will check. The datasets I’m trying to analyse are not listed on sra-explorer.info

ADD REPLY • link 24 months ago by bioinformatics ▴ 40

0

Entering edit mode

sra-explorer queries ncbi, so if it is not there it is likely not an NCBI dataset. Which one is it?

ADD REPLY • link 24 months ago by ATpoint 86k

0

Entering edit mode

GSE162454 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE162454

ADD REPLY • link 24 months ago by bioinformatics ▴ 40

1

Entering edit mode

From that link, the SRA identifier is SRP295404. Type that into sra-explorer. Add everything to cart and you'll see all the FASTQ FTP download URLs.

Alternatively, run the following on command line to get the FTP download URL urls:

pip install ffq
ffq --ftp SRP295404

ADD REPLY • link 24 months ago by dsull ★ 7.0k

0

Entering edit mode

Thanks. How might I find the SRA identifier from the link?

ADD REPLY • link 24 months ago by bioinformatics ▴ 40

1

Entering edit mode

You mean from the GEO page you link? It's below the Sample information section. You can also search sra-explorer for that BioProject (PRJNA...) number.

enter image description here

ADD REPLY • link 24 months ago by ATpoint 86k

1

Entering edit mode

That having said, unless you really want to process fastq files from scratch, you can just take the processed data under Supplementary file and start from there. That is most likely the CellRanger output containing raw counts per cell for every sample. That saves you a great deal of work. If not, so if you download fastq, the authors uploaded three files (example https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR13177101&display=metadata), the 8bp one is the read index (not needed, can discard) the second 28bp one is UMI+barcodes and the third one is the gene expression. You need the 2nd and third for (forrecponding to R1 and R2 from the sequencer) for preprocessing, be it CellRanger STARsolo or other approaches such as salmon-alevin or kallisto-bustools. As said, use the preprocessed data they provide, unless there is a good reason not to. On a MacBook air you do not want to do any preprocessing anyway, even with the most lightweight tools that is going to be painful.

ADD REPLY • link 24 months ago by ATpoint 86k

0

Entering edit mode

If possible, find a download URL for the FASTQ files (e.g. with sra-explorer or ffq). The sra toolkit is not really well-designed and I personally avoid using it.

ADD REPLY • link 24 months ago by dsull ★ 7.0k

score 2 · Answer 1 · 2022-12-25

2

Entering edit mode

24 months ago

GenoMax 148k

Are you using the latest macOS binary?

Looking at your command above it looks like you are not typing the correct command. I just tried vdb-config -i on a M1 mac and was able to get the configuration dialog.

ADD COMMENT • link 24 months ago by GenoMax 148k