I downloaded paired-end Illumina reads from the NCBI-SRA, and run fastq-dump --split-3
to get a legacy extraction of the corresponding fastq files
I ended with three files. The file_1.fastq.gz, file_2.fastq.gz and a third file.fastq.gz. The third one corresponds to 492919 files whose readlen < 1
Sizes of these fastq.gz files are huge. A simple counting of lanes takes too long to be accomplished. A test to extract and compare the order of the names and coordinates' read sequences will take even a longer time
So I rather ask here for previous experiences..
- Should I understand that name_1.fastq and name_2.fastq are synchronized files ?, that is, are the left and right reads are in the same order ?. I ask this because the size difference between the two files (the
_1
and the_2
) is notable - Is there any script that will allow me to synchronize these two files in case that I need it?
I answer to myself
Both files, file_1.fastq.gz and file_2.fastq.gz have at least the same number of lanes