Question

fastq-dump split-3 output

4

Entering edit mode

9.9 years ago

colonppg ▴ 120

If I run fastq-dump --split-3 on a sra file, I get

file_1.fastq
file_2.fastq
file.fastq

My questions is how I handle file.fastq? Should I just ignore it?

fastq-dump sra • 23k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.9 years ago by colonppg ▴ 120

0

Entering edit mode

Thanks so much! guys!

ADD REPLY • link 9.9 years ago by colonppg ▴ 120

2

Entering edit mode

9.9 years ago

Devon Ryan 105k

Generally file.fastq is much smaller than the other two and the result of orphaned reads after trimming. I would typically ignore that file unless you really need the extra reads. The exception is if file.fastq is larger than either of the other two fastq files. In that case, you probably want to ignore the paired-end reads (or just use them to get things like the insert size distribution).

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.9 years ago by Devon Ryan 105k

Ram · Accepted Answer · 2015-09-04

10

Entering edit mode

9.9 years ago

osullivanchristopher ▴ 210

--split-3 will output 1,2, or 3 files: 1 file means the data is not paired. 2 files means paired data with no low quality reads or reads shorter than 20bp. 3 files means paired data, but asymmetric quality or trimming. In the case of 3 file output, most people ignore <file>.fastq. This is a very old formatting option introduced for phase1 of 1000genomes. Before there were many analysis or trimming utilities and SRA submissions always contained all reads from sequencer. Back then nobody wanted to throw anything away. You might want to use --split-files instead. That will give only 2 files for paired-end data. Or not bother with text output and access the data directly using sra ngs apis.

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.9 years ago by osullivanchristopher ▴ 210

0

Entering edit mode

On some occasions with --split-files, you have to use -M 0 or else end up with unpaired reads, due to fastq-dump discarding small reads but keeping its pair.

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 9.9 years ago by h.mon 35k