If I run fastq-dump --split-3
on a sra file, I get
file_1.fastq
file_2.fastq
file.fastq
My questions is how I handle file.fastq
? Should I just ignore it?
If I run fastq-dump --split-3
on a sra file, I get
file_1.fastq
file_2.fastq
file.fastq
My questions is how I handle file.fastq
? Should I just ignore it?
--split-3
will output 1,2, or 3 files: 1 file means the data is not paired. 2 files means paired data with no low quality reads or reads shorter than 20bp. 3 files means paired data, but asymmetric quality or trimming. In the case of 3 file output, most people ignore <file>.fastq
. This is a very old formatting option introduced for phase1 of 1000genomes. Before there were many analysis or trimming utilities and SRA submissions always contained all reads from sequencer. Back then nobody wanted to throw anything away. You might want to use --split-files
instead. That will give only 2 files for paired-end data. Or not bother with text output and access the data directly using sra ngs apis.
Generally file.fastq
is much smaller than the other two and the result of orphaned reads after trimming. I would typically ignore that file unless you really need the extra reads. The exception is if file.fastq
is larger than either of the other two fastq files. In that case, you probably want to ignore the paired-end reads (or just use them to get things like the insert size distribution).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks so much! guys!