Question

Problems with downloading fastq files from sra-toolkit

0

Entering edit mode

3.4 years ago

pratarora • 0

I have been trying to download the fastq files of a single cell RNA experiment from SRX8632237 with the following SRA runs :

SRR12108143
SRR12108144
SRR12108145
SRR12108146

The runs show that it has 3 reads per spot (link). However, I am unable to download the three fastq files separately using sra-tookit (2.10.7). I always get one file.

I have tried multiple combination of fasterq-dump and fastq-dump to download the samples and used the options --split-files, --split-spot, --split-e, --split-3, --concatenate reads, --include-technical, --skip-technical, with prefetch, without prefetch and none of the combinations splits the files according to the reads.

Could someone please help me resolve this problem and let me know where I might be going wrong?

Thanks in advance!!

sra-toolkit 10X-Genomics NCBI single-cell • 2.0k views

ADD COMMENT • link updated 3.4 years ago by sasa ▴ 10 • written 3.4 years ago by pratarora • 0

0

Entering edit mode

If you look under the Data access tab you will see original format fastq files submitted originally. If you have a google cloud/AWS account you can download original files from there.

ADD REPLY • link 3.4 years ago by GenoMax 148k

0

Entering edit mode

Thank you for your quick reply!

I checked the AWS and GCP links, but the files are big and won't really fit in the free tier of AWS, is there any other way?

ADD REPLY • link 3.4 years ago by pratarora • 0

score 1 · Answer 1 · 2021-08-24

It works AOK for me. Just using --split-files works for most cases:

fastq-dump --split-files SRR12108143

head -20 SRR12108143_1.fastq
@SRR12108143.1 NS500717:249:HT5TNBGX5:1:11101:21539:1043 length=8
TTCAGGTG
+SRR12108143.1 NS500717:249:HT5TNBGX5:1:11101:21539:1043 length=8
AAA/AEEE
@SRR12108143.2 NS500717:249:HT5TNBGX5:1:11101:24889:1044 length=8
TTCAGGTG
+SRR12108143.2 NS500717:249:HT5TNBGX5:1:11101:24889:1044 length=8
AAAAAEE/
@SRR12108143.3 NS500717:249:HT5TNBGX5:1:11101:8341:1044 length=8
TTCAGGTG
+SRR12108143.3 NS500717:249:HT5TNBGX5:1:11101:8341:1044 length=8
/A/AAE/E
@SRR12108143.4 NS500717:249:HT5TNBGX5:1:11101:17829:1044 length=8
TTCAGGTG
+SRR12108143.4 NS500717:249:HT5TNBGX5:1:11101:17829:1044 length=8
/6/6A/66
@SRR12108143.5 NS500717:249:HT5TNBGX5:1:11101:4268:1044 length=8
TTCAGGTG
+SRR12108143.5 NS500717:249:HT5TNBGX5:1:11101:4268:1044 length=8
6AA6AEEE

head -20 SRR12108143_2.fastq
@SRR12108143.1 NS500717:249:HT5TNBGX5:1:11101:21539:1043 length=26
GGAGCNAAGTATTGGACTGCACGAGG
+SRR12108143.1 NS500717:249:HT5TNBGX5:1:11101:21539:1043 length=26
AAAAA#EEEEEEEEEEEEEEEEEEEE
@SRR12108143.2 NS500717:249:HT5TNBGX5:1:11101:24889:1044 length=26
CGGACNCCAAACCTACACCCACGCGC
+SRR12108143.2 NS500717:249:HT5TNBGX5:1:11101:24889:1044 length=26
A6AAA#/EEEE/EEEEEA/EAA/EEE
@SRR12108143.3 NS500717:249:HT5TNBGX5:1:11101:8341:1044 length=26
ACCTTNACACGTAAGGAAGCTTTTGC
+SRR12108143.3 NS500717:249:HT5TNBGX5:1:11101:8341:1044 length=26
6AAAA#EE/EEEEE/EE6EEEEEEEE
@SRR12108143.4 NS500717:249:HT5TNBGX5:1:11101:17829:1044 length=26
GCTGTNCCTCGTCTTCTTCACTAAAG
+SRR12108143.4 NS500717:249:HT5TNBGX5:1:11101:17829:1044 length=26
A6AAA#E6EE/EEEEEE/AEAEE//E
@SRR12108143.5 NS500717:249:HT5TNBGX5:1:11101:4268:1044 length=26
ACCGTNACACAAGCCCTTACGTTGTG
+SRR12108143.5 NS500717:249:HT5TNBGX5:1:11101:4268:1044 length=26
/AAAA#EEEEAEEEEEEAEEEEEEEE

head -200 SRR12108143_3.fastq | tail -20
@SRR12108143.46 NS500717:249:HT5TNBGX5:1:11101:25528:1060 length=49
NGAGCCAAAGCCCCCAGTGTTTGTATTTTGACGCCAAGCTTCACTTTAA
+SRR12108143.46 NS500717:249:HT5TNBGX5:1:11101:25528:1060 length=49
#A//AEE/EEEA6EE<EE///<A/<E/AEEA//<<//EAE<//6E/EEE
@SRR12108143.47 NS500717:249:HT5TNBGX5:1:11101:22932:1060 length=49
NATCACTACTCTAGGAAGGAAGGAAAACCTGACAACCTGAGCAAGAAGG
+SRR12108143.47 NS500717:249:HT5TNBGX5:1:11101:22932:1060 length=49
#A/AA/EE///</E//E/<////E///E/EA</<//<E/6<<//EE/A<
@SRR12108143.48 NS500717:249:HT5TNBGX5:1:11101:15378:1060 length=49
NAAAACATGATTGTCGAGGAGTGTGGCTGCTCCTAGAGTCGCGAGGTAC
+SRR12108143.48 NS500717:249:HT5TNBGX5:1:11101:15378:1060 length=49
#//AAA/AEAEA/A/EEE//A//AE<AAAAE<AA/E/EE</6///66/A
@SRR12108143.49 NS500717:249:HT5TNBGX5:1:11101:20756:1060 length=49
NGATCAACATTACTTATTGTTTGAATACTACGACAACTAAAATTTCACT
+SRR12108143.49 NS500717:249:HT5TNBGX5:1:11101:20756:1060 length=49
#/AAAEE6//EAE<<AA<////////<//E6/<////AAAE6//A/<</
@SRR12108143.50 NS500717:249:HT5TNBGX5:1:11101:11867:1061 length=49
NTATAAAGATGTTTAAAAAAGTCAGTTGCTTTTTTCGCGTAATGTGAAT
+SRR12108143.50 NS500717:249:HT5TNBGX5:1:11101:11867:1061 length=49
#A/AA/E/EE////E/EE/A//EEEEEEE/<<//A/E/AEE6A/AEEA<

score 0 · Answer 2 · 2021-08-24

Kevin's command above worked for me as well, either prefetch or not. How about installing the latest version of the SRA tool kit instead of Version 2.10.7? Version 2.11.1 is available now. Also, I don't think you need to download data through AWS in this case. The data storage issues for the free account caused me to panic...