Question

Download SRR by parallel and fastq dump with a weird problem

0

Entering edit mode

5 months ago

diqixiaoyaoer ▴ 20

Here is my script

project='PRJNA1100523'

esearch -db sra -query $project | efetch -format runinfo > runinfo.csv
cat runinfo.csv | cut -d "," -f 1 > SRR.numbers
cat SRR.numbers | parallel fastq-dump --split-files --origfmt --gzip -X 1000 {}  ## just test
cat SRR.numbers | parallel fastq-dump --split-files --origfmt --gzip {}  ## download complete

I just want to download SRR files here.

In PRJNA1100523, there are only 8 SRR files and SRR ID is saved in SRR.numbers.

But when running cat SRR.numbers | parallel fastq-dump --split-files --origfmt --gzip {}, there are extra strange SRR related fastq files download in folder.

SRR28698742
SRR28698743
SRR28698744
SRR28698745
SRR28698738
SRR28698739
SRR28698740
SRR28698741

Such as SRR29377445, SRR29377574 and SRR29413198, I don not know where they come from.

I am sure these strange SRR IDs are not saved in SRR.numbers.

And I don not know where it is wrong with my script. So I hope some of you could give me some advice or solutions.

Thanks in advance.

scRNA-seq linux parallel • 847 views

ADD COMMENT • link 5 months ago by diqixiaoyaoer ▴ 20

score 0 · Answer 1 · 2024-06-16

0

Entering edit mode

5 months ago

Ming Tommy Tang ★ 4.5k

I would suggest you to use fastq-dl https://github.com/rpetit3/fastq-dl to download at project level

ADD COMMENT • link 5 months ago by Ming Tommy Tang ★ 4.5k

0

Entering edit mode

Thanks, I will have a try.

ADD REPLY • link 5 months ago by diqixiaoyaoer ▴ 20

score 0 · Answer 2 · 2024-06-18

0

Entering edit mode

5 months ago

atharvakarkare14 ▴ 40

Refer this repository SRA-tookit automation.

ADD COMMENT • link 5 months ago by atharvakarkare14 ▴ 40

0

Entering edit mode

Thanks a lot. I have starred it. I will also have a try.

ADD REPLY • link 5 months ago by diqixiaoyaoer ▴ 20

score 0 · Answer 3 · 2024-06-18

0

Entering edit mode

5 months ago

ijarne ▴ 10

Hey I did this on my bachelor thesis days, I used SRA-Toolkit and FastQdump (isntalled bia anaconda) and by running this script on my laptop I was able to download the list of SRR entries that I wished. You type the name of the SRR archives you want to download and when executing the bash file and they will be downloaded and transformed into fastq format. It was a bit slow but it did the job. Sorry if the coding is not very clean but it was the best I could do in my Bachelor's days.

https://github.com/Iggi-29/SRA-fastq-downloader/blob/master/getSRAfastq.sh

Please feel free to reuse the code if you need it.

ADD COMMENT • link 5 months ago by ijarne ▴ 10

0

Entering edit mode

Maybe change /home/ignasi/sratoolkit.3.0.0-ubuntu64/bin/prefetch to $PREFETCH and your script won't use hanging invalid absolute paths: https://github.com/Iggi-29/SRA-fastq-downloader/blob/master/getSRAfastq.sh#L72

ADD REPLY • link 5 months ago by Ram 44k

0

Entering edit mode

I think it is cool ! I am also a bioinformatician but mainly focus on R and python data analysis.

I believe I can learn something from your script. Very grateful.

ADD REPLY • link 5 months ago by diqixiaoyaoer ▴ 20