Question

fastq-dump -I --split-files

0

Entering edit mode

5.6 years ago

mustafa_aljadi ▴ 10

Hey All, I am trying to do RNA seq on CLC, but before transfer my data to the program I need to split the reads that I have into forwarding and reverse. There is a program called fastq-dump -I --split-files that does this process. The problem which I have is the 75 RNA seq files has it's own SRR number. I tried many times to run the program for all my RNA seq reads but every time it gets failed. Can you please tell me what I should do? Your help is highly appreciated Thanks Mustafa

RNA-Seq • 4.7k views

ADD COMMENT • link updated 5.6 years ago by ATpoint 88k • written 5.6 years ago by mustafa_aljadi ▴ 10

0

Entering edit mode

I already download the files from NCBI. To transfer the 75 RNA seq files to CLC program those files need to be split into reverse and forward. I did use this code fastq-dump -I --split-files SRR390728.sra, which is applicable for example for one SRR390728. My question is how can I split the 75 RNA seq files together without using SRR numbers? Thanks Mustafa

ADD REPLY • link 5.6 years ago by mustafa_aljadi ▴ 10

0

Entering edit mode

My question is how can I split the 75 RNA seq files together without using SRR numbers?

@padwalmk has a potential solution for how to do this.

ADD REPLY • link 5.6 years ago by GenoMax 152k

0

Entering edit mode

Sorry I didn't get this part of your answer Just define number of threads available and the output directory. Put this script in your sra file directory and change permission with chmod u+x script. Can you please verify it to me? Thanks Mustafa

ADD REPLY • link 5.6 years ago by mustafa_aljadi ▴ 10

1

Entering edit mode

@padwalmk provided you with code for a bash script. You will need to put that (in a file say script.sh) in the directory where you downloaded all .sra files. You should also change the number that follows --threads 16 in script to number of cores you have available on your CPU (if you are using a simple computer locally that number may be 4 or 8). You will also need the gnu parallel program (search for it and install if needed). You can then run the bash script by doing something like bash script.sh on command prompt.

ADD REPLY • link 5.6 years ago by GenoMax 152k

GenoMax · Answer 1 · 2019-12-25

Hi, You can try the first parallel-fastq dump which split the SRA file using all threads so its, faster. First download the parallel-fastq dump and then run the following script, it will run in loop for all sra file and split using ncbi-sra tool kit.

Just define number of threads available and the output directory. Put this script in your sra file directory and change permission with chmod u+x script.

#!/bin/bash
fastq_dump(){
for i in *.sra
do 
echo "Starting the dump"
parallel-fastq-dump -s $i --split-3 --threads 16 -O /outputdir  --gzip --tmpdir /temp_dir 
echo "Completed dump"
done
}
fastq_dump

score 1 · Answer 2 · 2019-12-23

1

Entering edit mode

5.6 years ago

GenoMax 152k

Just get fastq files directly from EBI-ENA. You can use a neat tool called SRA-explorer to generate URL's for all sequenes you want to download. See: sra-explorer : find SRA and FastQ download URLs in a couple of clicks

ADD COMMENT • link 5.6 years ago by GenoMax 152k

score 1 · Answer 3 · 2020-01-06

1

Entering edit mode

5.6 years ago

ATpoint 88k

You can get fastq files directly from ENA. Outlined in this tutorial: Fast download of FASTQ files from the European Nucleotide Archive (ENA)

It also covers usage of prefetch and (parallel-)fastq-dump if you want to go with that.

ADD COMMENT • link 5.6 years ago by ATpoint 88k