Problems with downloading fastq files from sra-toolkit
2
0
Entering edit mode
3.3 years ago
pratarora • 0

I have been trying to download the fastq files of a single cell RNA experiment from SRX8632237 with the following SRA runs :

  • SRR12108143
  • SRR12108144
  • SRR12108145
  • SRR12108146

The runs show that it has 3 reads per spot (link). However, I am unable to download the three fastq files separately using sra-tookit (2.10.7). I always get one file.

I have tried multiple combination of fasterq-dump and fastq-dump to download the samples and used the options --split-files, --split-spot, --split-e, --split-3, --concatenate reads, --include-technical, --skip-technical, with prefetch, without prefetch and none of the combinations splits the files according to the reads.

Could someone please help me resolve this problem and let me know where I might be going wrong?

Thanks in advance!!

sra-toolkit 10X-Genomics NCBI single-cell • 2.0k views
ADD COMMENT
0
Entering edit mode

If you look under the Data access tab you will see original format fastq files submitted originally. If you have a google cloud/AWS account you can download original files from there.

ADD REPLY
0
Entering edit mode

Thank you for your quick reply!

I checked the AWS and GCP links, but the files are big and won't really fit in the free tier of AWS, is there any other way?

ADD REPLY
1
Entering edit mode
3.3 years ago

It works AOK for me. Just using --split-files works for most cases:

fastq-dump --split-files SRR12108143

head -20 SRR12108143_1.fastq
@SRR12108143.1 NS500717:249:HT5TNBGX5:1:11101:21539:1043 length=8
TTCAGGTG
+SRR12108143.1 NS500717:249:HT5TNBGX5:1:11101:21539:1043 length=8
AAA/AEEE
@SRR12108143.2 NS500717:249:HT5TNBGX5:1:11101:24889:1044 length=8
TTCAGGTG
+SRR12108143.2 NS500717:249:HT5TNBGX5:1:11101:24889:1044 length=8
AAAAAEE/
@SRR12108143.3 NS500717:249:HT5TNBGX5:1:11101:8341:1044 length=8
TTCAGGTG
+SRR12108143.3 NS500717:249:HT5TNBGX5:1:11101:8341:1044 length=8
/A/AAE/E
@SRR12108143.4 NS500717:249:HT5TNBGX5:1:11101:17829:1044 length=8
TTCAGGTG
+SRR12108143.4 NS500717:249:HT5TNBGX5:1:11101:17829:1044 length=8
/6/6A/66
@SRR12108143.5 NS500717:249:HT5TNBGX5:1:11101:4268:1044 length=8
TTCAGGTG
+SRR12108143.5 NS500717:249:HT5TNBGX5:1:11101:4268:1044 length=8
6AA6AEEE

head -20 SRR12108143_2.fastq
@SRR12108143.1 NS500717:249:HT5TNBGX5:1:11101:21539:1043 length=26
GGAGCNAAGTATTGGACTGCACGAGG
+SRR12108143.1 NS500717:249:HT5TNBGX5:1:11101:21539:1043 length=26
AAAAA#EEEEEEEEEEEEEEEEEEEE
@SRR12108143.2 NS500717:249:HT5TNBGX5:1:11101:24889:1044 length=26
CGGACNCCAAACCTACACCCACGCGC
+SRR12108143.2 NS500717:249:HT5TNBGX5:1:11101:24889:1044 length=26
A6AAA#/EEEE/EEEEEA/EAA/EEE
@SRR12108143.3 NS500717:249:HT5TNBGX5:1:11101:8341:1044 length=26
ACCTTNACACGTAAGGAAGCTTTTGC
+SRR12108143.3 NS500717:249:HT5TNBGX5:1:11101:8341:1044 length=26
6AAAA#EE/EEEEE/EE6EEEEEEEE
@SRR12108143.4 NS500717:249:HT5TNBGX5:1:11101:17829:1044 length=26
GCTGTNCCTCGTCTTCTTCACTAAAG
+SRR12108143.4 NS500717:249:HT5TNBGX5:1:11101:17829:1044 length=26
A6AAA#E6EE/EEEEEE/AEAEE//E
@SRR12108143.5 NS500717:249:HT5TNBGX5:1:11101:4268:1044 length=26
ACCGTNACACAAGCCCTTACGTTGTG
+SRR12108143.5 NS500717:249:HT5TNBGX5:1:11101:4268:1044 length=26
/AAAA#EEEEAEEEEEEAEEEEEEEE

head -200 SRR12108143_3.fastq | tail -20
@SRR12108143.46 NS500717:249:HT5TNBGX5:1:11101:25528:1060 length=49
NGAGCCAAAGCCCCCAGTGTTTGTATTTTGACGCCAAGCTTCACTTTAA
+SRR12108143.46 NS500717:249:HT5TNBGX5:1:11101:25528:1060 length=49
#A//AEE/EEEA6EE<EE///<A/<E/AEEA//<<//EAE<//6E/EEE
@SRR12108143.47 NS500717:249:HT5TNBGX5:1:11101:22932:1060 length=49
NATCACTACTCTAGGAAGGAAGGAAAACCTGACAACCTGAGCAAGAAGG
+SRR12108143.47 NS500717:249:HT5TNBGX5:1:11101:22932:1060 length=49
#A/AA/EE///</E//E/<////E///E/EA</<//<E/6<<//EE/A<
@SRR12108143.48 NS500717:249:HT5TNBGX5:1:11101:15378:1060 length=49
NAAAACATGATTGTCGAGGAGTGTGGCTGCTCCTAGAGTCGCGAGGTAC
+SRR12108143.48 NS500717:249:HT5TNBGX5:1:11101:15378:1060 length=49
#//AAA/AEAEA/A/EEE//A//AE<AAAAE<AA/E/EE</6///66/A
@SRR12108143.49 NS500717:249:HT5TNBGX5:1:11101:20756:1060 length=49
NGATCAACATTACTTATTGTTTGAATACTACGACAACTAAAATTTCACT
+SRR12108143.49 NS500717:249:HT5TNBGX5:1:11101:20756:1060 length=49
#/AAAEE6//EAE<<AA<////////<//E6/<////AAAE6//A/<</
@SRR12108143.50 NS500717:249:HT5TNBGX5:1:11101:11867:1061 length=49
NTATAAAGATGTTTAAAAAAGTCAGTTGCTTTTTTCGCGTAATGTGAAT
+SRR12108143.50 NS500717:249:HT5TNBGX5:1:11101:11867:1061 length=49
#A/AA/E/EE////E/EE/A//EEEEEEE/<<//A/E/AEE6A/AEEA<
ADD COMMENT
0
Entering edit mode
3.3 years ago
sasa ▴ 10

Kevin's command above worked for me as well, either prefetch or not. How about installing the latest version of the SRA tool kit instead of Version 2.10.7? Version 2.11.1 is available now. Also, I don't think you need to download data through AWS in this case. The data storage issues for the free account caused me to panic...

ADD COMMENT

Login before adding your answer.

Traffic: 2770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6