Raw data: http://www.ebi.ac.uk/ena/data/view/SRR346373&display=html
Also on NCBI: http://www.ncbi.nlm.nih.gov/sra?term=%09SRR346373
I downloaded them and the first 4 lines looks like the following:
SRR346373$ head -4 S*fastq
==> SRR346373_1.fastq <==
@SRR346373.13045 0176_20090623_2_H3K4me3_28_21_20/1
T23133223302220122222232212322320332
+
!%(#$%#$%%####*%#%##&#$##$##&#&#$$,+
==> SRR346373_2.fastq <==
@SRR346373.13045 0176_20090623_2_H3K4me3_28_21_20/2
G0012130112
+
!*)&#$&'###
==> SRR346373.fastq <==
@SRR346373.1 0176_20090623_2_H3K4me3_3_25_119/1
T30200011130100000000000000000000000
+
!%/%%5)&4(%#(7&?2&'6&.,684;.6>',7A?1
It seems obvious that 2
and 1
fastq are within a pair-end data. But what does SRR346373.fastq stands for? It is much smaller than the other two fastq file(1/20 lines of them). Anyone knows what does it means?
It looks like SRR346373 is the first read, SRR346373_1 is the second read and SRR346373_2 is the barcode. The NCBI page you link to has details associating each barcode sequence with the sample and replicate.
I don't think so, because SRR346373_1.fastq and SRR346373_2.fastq both have 87354416 lines and SRR346373.fastq has 4213292 lines, it's possible that SRR346373_1.fastq is paired with SRR346373_2.fastq, but if SRR346373.fastq is the Barcode file, how could it has so few lines..
I read the NCBI page about barcode and try to split the barcode file, but if the barcode file can't map to the pair-end files "Line-by-line", I don't think it make sense.
Hi all,
Sorry to bring you back to this old thread as I noticed something new in relevance to this thread. In the past, when I used
wget
and localfastqdump
, I usually only get the_1.fastq.gz
and_2.fastq.gz
. But sometimes also the 3rd file for the single reads. However, in my recent direct use of fastqdump (v2.6.3) from the NCBI server with/fastq-dump
with--split-files --gzip sraID
(no choice as the ftp url is no long available), I got_1.fastq.gz
and_3.fastq.gz
(instead of_2
), which seem to represent the pair-end sequences. In agreement with this, on the sra record, it indicates the barcode is between the two reads. So I guess in this case, the_1
and_3
are for pair-end sequences if--split-files
is used, and I haven't tried to use--split-3
, perhaps it will produce_1
and_2
and the 3rd . Below are the output of the first read from both_1
and_3
.I also noticed the much slower speed compared to
wget
, and will try to the option of converting fastq to fastq.gz locally. Any comments/corrections are appreciated.Thanks a lot.
Ping
If in doubt grab the fastq files from ENA directly.