How to parallelize fastq-dump command when reading SRA IDs from a .txt file?
2
2
Entering edit mode
7.3 years ago
bioinform ▴ 30

How to paralellize fastq-dump command when reading SRA IDs from a .txt file?

here is my working code without paralell, it downloads a pair of fastq files:

    list=`cat SRAIdFromPythonInput.txt` # list of the SRA record file  IDs.
     for i in $list
     do  echo $i
    ./fastq-dump --split-files $i -v
     done

How to rewrite it using parallel GNU to make it download all the data with SRA IDs written in .txt file, not a single pair of fastqs? How to apply pattern "cat list | parallel "do-something1 {} config-{} ; do-something2 < {}" | process-output" to these codes?

paralell gnu shell fastq-dump sra • 5.4k views
ADD COMMENT
0
Entering edit mode

I'm too lazy to check/test: what would be the generated files for one given ID ?

ADD REPLY
0
Entering edit mode

2 fastqs with SRA ids as the names

ADD REPLY
0
Entering edit mode

what would be the names ? ID.fq.gz ? ID.fastq ? ID_R1.fq ? ID_R1.fastq.gz ?

ADD REPLY
0
Entering edit mode

ID.fastq a pair of them, I use renaming code in the next step

SRR5656566_1.fastq and SRR5656566_2.fastq

ADD REPLY
0
Entering edit mode
7.3 years ago

using a Makefile

IDS=$(shell cat SRAIdFromPythonInput.txt)

%_2.fastq: %_1.fastq
    touch -c $@

%_1.fastq:
    ./fastq-dump --split-files $* -v && touch -c $@

all: $(addsuffix _2.fastq,$(IDS)) $(addsuffix _1.fastq,$(IDS))

invoke with make and the number of parallel jobs. e.g:

make -j 16
ADD COMMENT
0
Entering edit mode

thank you for your efforts, could you please write these codes in a manner of the pattern of the GNU parallel: cat list | parallel "do-something1 {} config-{} ; do-something2 < {}" | process-output, why do you use Makefile? and is there any tutorial, article or a chapter on using it in bioinformatics? I have never used Makefile for NGS data processing. I found one at http://bsmith89.github.io/make-bml/

ADD REPLY
1
Entering edit mode

could you please write these codes in a manner of the pattern of the GNU parallel

no

why do you use Makefile?

because it works, it's easy , standard, ubiquitous, universal , etc...

ADD REPLY
0
Entering edit mode

thanks, need code examples using GNU parallel, however,

ADD REPLY
0
Entering edit mode
7.3 years ago
ole.tange ★ 4.5k

It is unclear to me what SRAIdFromPythonInput.txt contains. Can you give a couple of lines as example?

doit() {
  i="$1"
  echo "$i"
  ./fastq-dump --split-files $i -v
}
export -f doit
parallel doit :::: SRAIdFromPythonInput.txt
ADD COMMENT
0
Entering edit mode

It contains a column of SRA IDs:

 SRR5656566
 SRR5656567
 SRR5656518
 SRR5656500

thx

ADD REPLY

Login before adding your answer.

Traffic: 2364 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6