Download SRR by parallel and fastq dump with a weird problem
3
0
Entering edit mode
5 months ago

Here is my script

project='PRJNA1100523'

esearch -db sra -query $project | efetch -format runinfo > runinfo.csv
cat runinfo.csv | cut -d "," -f 1 > SRR.numbers
cat SRR.numbers | parallel fastq-dump --split-files --origfmt --gzip -X 1000 {}  ## just test
cat SRR.numbers | parallel fastq-dump --split-files --origfmt --gzip {}  ## download complete

I just want to download SRR files here.

In PRJNA1100523, there are only 8 SRR files and SRR ID is saved in SRR.numbers.

But when running cat SRR.numbers | parallel fastq-dump --split-files --origfmt --gzip {}, there are extra strange SRR related fastq files download in folder.

SRR28698742
SRR28698743
SRR28698744
SRR28698745
SRR28698738
SRR28698739
SRR28698740
SRR28698741

Such as SRR29377445, SRR29377574 and SRR29413198, I don not know where they come from.

I am sure these strange SRR IDs are not saved in SRR.numbers.

And I don not know where it is wrong with my script. So I hope some of you could give me some advice or solutions.

Thanks in advance.

scRNA-seq linux parallel • 847 views
ADD COMMENT
0
Entering edit mode
5 months ago
Ming Tommy Tang ★ 4.5k

I would suggest you to use fastq-dl https://github.com/rpetit3/fastq-dl to download at project level

ADD COMMENT
0
Entering edit mode

Thanks, I will have a try.

ADD REPLY
0
Entering edit mode
5 months ago

Refer this repository SRA-tookit automation.

ADD COMMENT
0
Entering edit mode

Thanks a lot. I have starred it. I will also have a try.

ADD REPLY
0
Entering edit mode
5 months ago
ijarne ▴ 10

Hey I did this on my bachelor thesis days, I used SRA-Toolkit and FastQdump (isntalled bia anaconda) and by running this script on my laptop I was able to download the list of SRR entries that I wished. You type the name of the SRR archives you want to download and when executing the bash file and they will be downloaded and transformed into fastq format. It was a bit slow but it did the job. Sorry if the coding is not very clean but it was the best I could do in my Bachelor's days.

https://github.com/Iggi-29/SRA-fastq-downloader/blob/master/getSRAfastq.sh

Please feel free to reuse the code if you need it.

ADD COMMENT
0
Entering edit mode

Maybe change /home/ignasi/sratoolkit.3.0.0-ubuntu64/bin/prefetch to $PREFETCH and your script won't use hanging invalid absolute paths: https://github.com/Iggi-29/SRA-fastq-downloader/blob/master/getSRAfastq.sh#L72

ADD REPLY
0
Entering edit mode

I think it is cool ! I am also a bioinformatician but mainly focus on R and python data analysis.

I believe I can learn something from your script. Very grateful.

ADD REPLY

Login before adding your answer.

Traffic: 2658 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6