the easiest way to quickly check the info is
bio search SRR12920588
it tells us that:
[
{
"run_accession": "SRR12920588",
"sample_accession": "SAMN16578971",
"first_public": "2021-01-04",
"country": "",
"sample_alias": "ChIP polE TC H3K56Q_2 5",
"fastq_bytes": "7406752;6470603",
"read_count": "206418",
"library_name": "ChIP polE TC H3K56Q_2 5",
"library_strategy": "ChIP-Seq",
"library_source": "GENOMIC",
"library_layout": "PAIRED",
"instrument_platform": "ILLUMINA",
"instrument_model": "Illumina NovaSeq 6000",
"study_title": "Rtt109 effect on replication speed via histone acetylation",
"fastq_ftp": "ftp.sra.ebi.ac.uk/vol1/fastq/SRR129/088/SRR12920588/SRR12920588_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR129/088/SRR12920588/SRR12920588_2.fastq.gz"
}
]
now we know that it is paired-end, which means we get at least two, but possibly more files, sometimes even the sample indices are also included.
Now let the bioinformatics begin:
fastq-dump -X 10 SRR12920588
seqkit stats SRR12920588.fastq
will print:
file format type num_seqs sum_len min_len avg_len max_len
SRR12920588.fastq FASTQ DNA 10 1,020 102 102 102
Look like we got a single file called SRR12920588.fastq
with 10 records , where each sequence is 102bp
long. The paired reads are concatenated into a single long sequence.
If we were to passing the --split-spot
flag to the same command:
fastq-dump -X 10 --split-spot SRR12920588
seqkit stats SRR12920588.fastq
the results will be:
file format type num_seqs sum_len min_len avg_len max_len
SRR12920588.fastq FASTQ DNA 20 1,020 51 51 51
now we got a single file called SRR12920588.fastq
but this time it has 20 records, where each sequence is 51bp
long. The paired reads follow one another (so-called interleaved format)
But them if we were to pass the --split-files
flag
seqkit stats SRR12920588_*
the command will now produce two files, each with 10 reads and each with that are 51bp
long.
file format type num_seqs sum_len min_len avg_len max_len
SRR12920588_1.fastq FASTQ DNA 10 510 51 51 51
SRR12920588_2.fastq FASTQ DNA 10 510 51 51 51
you can also pass the --split-3
flag, but and that might produce more files, but not in this case.
As a rule, run read statistics to understand what is inside your file.
I think it is paired end data, for SRR12920588 it says layout is paired-end. For SRR17115876 though the layout is mentioned as single, if you look at the reads in the run browser it says Reads(joined) and there is an option to view separate reads (looks like paired-end to me).