Hello I ned to retrieve 42 fastq files from NCBI SRA: https://www.ncbi.nlm.nih.gov/sra?LinkName=biosample_sra&from_uid=13674977
I retrieve the SRA accession numbers and save them to a file called "SraAccList.txt" which stores the SRA accession numbers to to the sequencing data. The paper methodology mentioned they worked with paired end sequencing data. So I did the following for retrieving the fastq files:
list=$(cat SraAccList.txt)
for accs in $list
do
prefetch $accs
done
then for the retrieved .sra files I used fastq-dump to finally get the paired end reads:
for f in *.sra
do
fastq-dump --split-3 $f
done
but I only got SRR{numbers}.fastq files and not paired end reads files.
In other similar threads there is discussed the fact that it can be the case where the submitters don't provide the full fastq data but I'm not sure if that is my case or the retrieved fastq files are in interleaved format or are just single-end data.
I took a look into the run information page of the 42 SRR accessions and they are labeled as PAIRED sequencing data: https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR10912829
but it seems that definitely they only provided single-end data:
I compared these submitted SRR to a well published one and the submitters provide both pairs as shown on the green bars:
So it seems the submitters don't provide the complete sequencing data, is this correct?
I didn't see the run browser!. Definitely they uploaded each file separated on each SRR accession. Thank you so much.
nice observation GenoMax in bioinformatics we have to expect the unexpected