Hi All,
What's your method to check the completeness of the fastq file after the download by fastq-dump from SRA database? I always find some non-completeness fastqs after the fastq-dump.
Thanks.
Hi All,
What's your method to check the completeness of the fastq file after the download by fastq-dump from SRA database? I always find some non-completeness fastqs after the fastq-dump.
Thanks.
Just to update this, it is not recommended to use fastq-dump
for downloads. It is slow and prone to connection losses. Better use prefetch
together with Aspera, see here, to get the SRA files, and then use fastq-dump
to convert to fastq. Still, you can get most data directly from the European Nucleotide Archive in fastq format. Downloading from there is pretty simple and fast, see my tutorial on that: Fast download of FASTQ files and metadata from the European Nucleotide Archive (ENA) . If you have to download from NCBI, e.g. because data are restricted, go with prefetch
followed by parallel-fastq-dump, which is a wrapper for parallelizing fastq-dump
. After successfully converting a sra to fastq, both tools (fastq-dump/parallel-fastq-dump
) print a summary message that only shows up if no errors occurred, so I never felt the need to verify the fastq file after converting from sra, given that message was printed.
You can use fqlint to identify a broad range of issues Illumina-based FASTQ files. If your download happens to be interrupted at the exact boundary between reads, then this will not report an error: it will only report malfored FASTQ files.
To install it, you can do the following after installing Rust.
cargo install --git https://github.com/stjude/fqlib.git
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You should always check EBI-ENA to see if fastq files are available. For the SRR# you posted below.
see How can I find SRA MD5 checksums for FASQ files?
By the way: how to deal with Resume Broken Download Problem for fastq-dump ?
17 months ago and no answer to thais question, i have the same issue here when dumping big files (~30G) and don't want to restart downloading, how to resume browken download with fast-dump? best
Thanks. The method you mention works in some way. However, for the majority situation, it doesn't work. for example:
if you just download the SRA files, I think it is okay to use