What is the difference between --split-files and --split-3 when using fastq-dump?
2
3
Entering edit mode
8.7 years ago
elenajmichel ▴ 90

Hello,

I have been using --split-files when using fastq-dump, but I have seen a lot of posts saying to use --split-3. The manual page is not quite clear about the difference between the two commands (besides the number of files generated), so could someone tell me what the difference is between the two commands, and under what circumstances it may be better to use one over the other?

Thank you.

RNA-Seq fastq-dump paired end reads • 13k views
ADD COMMENT
2
Entering edit mode

Please, look there about split-3 option --> fastq-dump split-3 output

Also, please look there --> C: SRA to BAM - to see about --split-files option

ADD REPLY
1
Entering edit mode

Thank you! The fastq-dump split-3 output link helped a lot.

ADD REPLY
2
Entering edit mode
8.7 years ago
ohadg123 ▴ 30

According to the manual It looks like split-files creates a file for every read. Meaning the output will have a lot of files.

--split-3 is for paired-end reads. So it will produce 2 fastq files only.

If your data is single-end you don't need to use these options.

ADD COMMENT
1
Entering edit mode

--split-files actually behaves the same as --split-3 and produces 2 fastq files for each SRA, hence my confusion.

ADD REPLY
3
Entering edit mode

"--split-3 will output 1,2, or 3 files: 1 file means the data is not paired. 2 files means paired data with no low quality reads or reads shorter than 20bp. 3 files means paired data, but asymmetric quality or trimming. in the case of 3 file output, most people ignore <file>.fastq . this is a very old formatting option introduced for phase1 of 1000genomes. before there were many analysis or trimming utilities and SRA submissions always contained all reads from sequencer. back then nobody wanted to throw anything away. you might want to use --split-files instead. that will give only 2 files for paired-end data. or not bother with text output and access the data directly using sra ngs apis." (from Question: fastq-dump split-3 output )

ADD REPLY
1
Entering edit mode

Actually split-3 outputs 3 files:

two files for the paired and another one for single-tone. I just run split-files on my end and it did output 2 files only. I guess the difference is the single-tone output. I'm not sure though

ADD REPLY
1
Entering edit mode

is your data paired-end reads? or single-end?

ADD REPLY
1
Entering edit mode

Split-files option outputs two fastq files in teh case of paired-end reads: one file - for forward, another - for reverse reads.

ADD REPLY
1
Entering edit mode
8.7 years ago
GenoMax 148k

Explained in the inline help for fastq-dump.

$ fastq-dump -h

  --split-files                    Dump each read into separate file.Files 
                                   will receive suffix corresponding to read 
                                   number 
  --split-3                        Legacy 3-file splitting for mate-pairs: 
                                   First biological reads satisfying dumping 
                                   conditions are placed in files *_1.fastq and 
                                   *_2.fastq If only one biological read is 
                                   present it is placed in *.fastq Biological 
                                   reads and above are ignored.
ADD COMMENT
1
Entering edit mode

Hi, as I stated in my question I don't think it is clearly explained because the explanation for --split-files makes it sound as if a separate file will be created for each read, which isn't correct as it just splits them into two files. Based on that, I don't really understand why you would use --split-3 vs --split-files to get the same result.

ADD REPLY
3
Entering edit mode

"fastq-dump" is used with Illumina data and traditionally "read" data in Illumina speak refers to Read 1 or Read 2 (not individual reads in the files, which if you think about logically would not make sense i.e. a separate file for each read).

ADD REPLY

Login before adding your answer.

Traffic: 1685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6