I am using fasterq-dump to download from sra, and using split-files to split paired end reads. as a result I receive one or two files. when I have two files they are in the format *_1.fastq
, and another file *_2.fastq
or *_3.fastq
or *_4.fastq
I cannot find what is the meaning of these numbers?
the command I am using:
fasterq-dump --split-files -O /media/lab/fastq ERR016705
for example:
ERR016705 has 2 files: _1, _4 ERR015587 has 2 files: _1, _2
I am also confused with this. On the HowTo page, they say you could get 1 and 2.fastq files for paired reads, and a 3.fastq for unmated. But on item 8, they list 1 and 2 and a simple .fastq. Is this simple .fastq also for unmated reads? Or is it different from the 3.fastq? After reading this post, I'm not sure if the .fastq file contains the unmated reads, or low quality reads and should be ignored.
That option was likely used with older data if you are looking at something recent then chances of getting a third file should be small unless submitters have supplied data from an index read as a separate file. In case of single cell data 10x
cellranger
software produces a separate file index reads when used for demultiplexing.Anyone reaching this post by search in future ERR016705 now shows just two fastq files at ENA.