It seems obvious that 2 and 1 fastq are within a pair-end data. But what does SRR346373.fastq stands for? It is much smaller than the other two fastq file(1/20 lines of them). Anyone knows what does it means?
It looks like SRR346373 is the first read, SRR346373_1 is the second read and SRR346373_2 is the barcode. The NCBI page you link to has details associating each barcode sequence with the sample and replicate.
I don't think so, because SRR346373_1.fastq and SRR346373_2.fastq both have 87354416 lines and SRR346373.fastq has 4213292 lines, it's possible that SRR346373_1.fastq is paired with SRR346373_2.fastq, but if SRR346373.fastq is the Barcode file, how could it has so few lines..
I read the NCBI page about barcode and try to split the barcode file, but if the barcode file can't map to the pair-end files "Line-by-line", I don't think it make sense.
Sorry to bring you back to this old thread as I noticed something new in relevance to this thread. In the past, when I used wget and local fastqdump, I usually only get the _1.fastq.gz and _2.fastq.gz. But sometimes also the 3rd file for the single reads. However, in my recent direct use of fastqdump (v2.6.3) from the NCBI server with /fastq-dump with --split-files --gzip sraID (no choice as the ftp url is no long available), I got _1.fastq.gz and _3.fastq.gz (instead of _2), which seem to represent the pair-end sequences. In agreement with this, on the sra record, it indicates the barcode is between the two reads. So I guess in this case, the _1 and _3 are for pair-end sequences if --split-files is used, and I haven't tried to use --split-3, perhaps it will produce _1 and _2 and the 3rd . Below are the output of the first read from both _1 and _3.
I also noticed the much slower speed compared to wget, and will try to the option of converting fastq to fastq.gz locally. Any comments/corrections are appreciated.
I'd guess it is a file of the remaining unpaired reads.
The _1 and _2 files should have the same sequence IDs in the same order. The third file contains reads for which paired sequence was not generated and may contain reads labeled either /1 or /2.
Structuring the data this way saves having to do the uneven traversal of the two files, you can always assume that the 200th read in the _1 file corresponds to the 200th read in the _2 file.
Being AB_SOLiD data, the _1 file is the Forward [F3] read (T prefix), the _2 file is the Reverse [R3] read (G prefix).
It looks like SRR346373 is the first read, SRR346373_1 is the second read and SRR346373_2 is the barcode. The NCBI page you link to has details associating each barcode sequence with the sample and replicate.
I don't think so, because SRR346373_1.fastq and SRR346373_2.fastq both have 87354416 lines and SRR346373.fastq has 4213292 lines, it's possible that SRR346373_1.fastq is paired with SRR346373_2.fastq, but if SRR346373.fastq is the Barcode file, how could it has so few lines..
I read the NCBI page about barcode and try to split the barcode file, but if the barcode file can't map to the pair-end files "Line-by-line", I don't think it make sense.
Hi all,
Sorry to bring you back to this old thread as I noticed something new in relevance to this thread. In the past, when I used
wget
and localfastqdump
, I usually only get the_1.fastq.gz
and_2.fastq.gz
. But sometimes also the 3rd file for the single reads. However, in my recent direct use of fastqdump (v2.6.3) from the NCBI server with/fastq-dump
with--split-files --gzip sraID
(no choice as the ftp url is no long available), I got_1.fastq.gz
and_3.fastq.gz
(instead of_2
), which seem to represent the pair-end sequences. In agreement with this, on the sra record, it indicates the barcode is between the two reads. So I guess in this case, the_1
and_3
are for pair-end sequences if--split-files
is used, and I haven't tried to use--split-3
, perhaps it will produce_1
and_2
and the 3rd . Below are the output of the first read from both_1
and_3
.I also noticed the much slower speed compared to
wget
, and will try to the option of converting fastq to fastq.gz locally. Any comments/corrections are appreciated.Thanks a lot.
Ping
If in doubt grab the fastq files from ENA directly.