Question

Help needed with sra-toolkit; Does not show 3 files as per intended

0

Entering edit mode

2.4 years ago

SeroroO ▴ 50

Hi all!

I'm trying to download the following dataset (PRJNA577618) from NCBI using the sratoolkit, specifically the fasterq-dump. I note that from the individual file metadata (for example), there should be 3 output files corresponding to the L1, R1, and R2. However, regardless of the setting I've used (ie: I've tried explicitly using --split-3 with fasterq-dump althought it is the default), I am only getting one .fastq file.

Anyone has any idea why this might be so? Alternatively, anyone has any solution on how I could split the files after downloading them?

Thanks in advance!

fasterq-dump sratoolkit • 1.3k views

ADD COMMENT • link updated 2.3 years ago by GenoMax 147k • written 2.4 years ago by SeroroO ▴ 50

score 2 · Accepted Answer · 2022-07-20

2

Entering edit mode

2.4 years ago

GenoMax 147k

I tried simple fastq-dump with one sample and was able to get three files. You could download the .sra file and then split it later.

 $ fastq-dump -X 5 -F --split-files  SRR10278808

::::::::::::::
SRR10278808_1.fastq
::::::::::::::
@NB500909:174:HTM7GBGX7:1:11101:5547:1095
AGGAGATG
+NB500909:174:HTM7GBGX7:1:11101:5547:1095
AAAAAEEE

::::::::::::::
SRR10278808_2.fastq
::::::::::::::
@NB500909:174:HTM7GBGX7:1:11101:5547:1095
AACTCCCGTCNGCTGACTCGTTATGA
+NB500909:174:HTM7GBGX7:1:11101:5547:1095
AAAAAEEEEE#EEEEEAEEEEEEEEE

::::::::::::::
SRR10278808_3.fastq
::::::::::::::
@NB500909:174:HTM7GBGX7:1:11101:5547:1095
ATTTTGAAAAAGAAAAGGCTGGAGATTTCACTGTCATTGATTTTAAGTTTAAAGCACCAGTAATCAAAATTGTATGGTTTAGGCTTAAGGATAGATATATTCATCAGTGANNNNNN
+NB500909:174:HTM7GBGX7:1:11101:5547:1095
AAAAAEEEEEEAAEE/EEEEEEEE//EEEEEAEEEEEEEE/AAEEEEEAEE/<EEEAEEEEEAEEEEEEAEE<E/EE</E<EEE//EEE/</EE/<AAEEEEA/AEEEE<######

ADD COMMENT • link 2.4 years ago by GenoMax 147k

0

Entering edit mode

Thank you for this! Didn't thought of trying the other similar function (fastq-dump) instead of fasterq-dump. Will try this later!

Also, do you know how I could possibly split the files into 3 (L1, R1, R2) after downloading? I did a quick search online, and most of them mentioned deinterleaving. I am not sure if this is the right strategy, since most of the tools I've search online splits them into 2 files (Forward and Reverse) by assumingly that each read takes on a fixed number of alternating lines in the file. However, the .fastq files I've already downloaded includes the L1-reads ontop of the forward and reverse reads, and I am not sure if they are suitable for said tools.

ADD REPLY • link 2.4 years ago by SeroroO ▴ 50

1

Entering edit mode

If you download the .sra file with prefetch you can simply use the command above on that file. If the file you downloaded as the three reads interleaved it may be difficult to separate them without some custom code.

ADD REPLY • link 2.3 years ago by GenoMax 147k