I am trying to convert sra from PRJNA282735 dataset to fastq and I am getting following error...
fastq-dump.2.1.7 fatal: SIGNAL - Segmentation fault
My fastq-dump command is
fastq-dump --split-3 SRR2016445.sra -O SRR2016445
I am not able to find similar error elsewhere. The ENA page for some samples of this dataset has three files per SRX experiment (e.g. SRR2016445.fastq, SRR2016445_1.fastq and SRR2016445_2.fastq).
This is unusual for me as I usually get one or two SRR runs per experiment (depending on single end paired end) but never 3. I am wondering if this is the reason for getting errors.
Are you using the latest sratoolkit? NCBI has moved to HTTPS only connections. I am getting two files dumping with (v. 2.8) fastq-dump --split-3 SRR2016445
I think 2.1. is fairly old, with a slightly newer version 2.4.2 I get:
fastq-dump.2.4.2 err: error unexpected while resolving tree within virtual file system module - failed to resolve accession 'SRR2016445' - Obsolete software. See https://github.com/ncbi/sra-tools/wiki ( 406 )
Please note that most SRA files are not self contained, they depend on a reference sequence which is a separate download. Thus it is not enough to download the SRA file with wget. 'fastq-dump' will try to download the reference sequence behind the scenes before it extracts any reads. The reference sequence for SRR2016445 is https://www.ncbi.nlm.nih.gov/nuccore/149361431.
A reference file got downloaded in ~/public/refseq/ folder and SRR file in ~/public/sra/ folder. I could split SRR file into three fastq files using fastq-dump2.8.0 command. I guess the small fastq file without '_1' or '_2' extension comprises of unpaired reads.
For some reason, I am not able to convert the reference file 'NC_000072' from binary to fasta using fastq-dump.
P.S. fastq-dump does not work very well for download. It downloads both SRR file and reference file just like prefetch command, but the files retain .cache extension, which I believe is an indication of incomplete download.
I noticed some time ago that the .cache files would always placed in your home by fastq-dump, while you download to a possibly much larger partition. This can easily fill up your home and it won't remove the files. I would therefore try to run fastq-dump like this HOME=./ fastq-dump --split-3 SRR2016445
Running vdb-config -i allows one to choose directories that will be used by SRAtoolkit. This needs to be done once and will require X-windows (if run with -i). If you want a pure text version run vdb-config -i --interactive-mode textual.
Configuration looks fine. Default path is ncbi/public. There is no proxy and rest of the settings are default. So why fastq-dump didn't download the SRR file properly (without .cache extension) and why reference file was not converted to fasta is a mystery to me. For now, I can survive with this three step process (prefetch/fastq-dump/vdb-dump).
P.S. fastq-dump works fine for other datasets which don't have refseq file.
Brave people will just edit '~/.ncbi/user-settings.mkfg' with their favorite text editor. Having ' vdb-config' to modifying a simple config file is over engineered.
Are you using the latest sratoolkit? NCBI has moved to HTTPS only connections. I am getting two files dumping with (v. 2.8)
fastq-dump --split-3 SRR2016445