Help needed with sra-toolkit; Does not show 3 files as per intended
1
0
Entering edit mode
2.4 years ago
SeroroO ▴ 50

Hi all!

I'm trying to download the following dataset (PRJNA577618) from NCBI using the sratoolkit, specifically the fasterq-dump. I note that from the individual file metadata (for example), there should be 3 output files corresponding to the L1, R1, and R2. However, regardless of the setting I've used (ie: I've tried explicitly using --split-3 with fasterq-dump althought it is the default), I am only getting one .fastq file.

Anyone has any idea why this might be so? Alternatively, anyone has any solution on how I could split the files after downloading them?

Thanks in advance!

fasterq-dump sratoolkit • 1.3k views
ADD COMMENT
2
Entering edit mode
2.4 years ago
GenoMax 147k

I tried simple fastq-dump with one sample and was able to get three files. You could download the .sra file and then split it later.

 $ fastq-dump -X 5 -F --split-files  SRR10278808

::::::::::::::
SRR10278808_1.fastq
::::::::::::::
@NB500909:174:HTM7GBGX7:1:11101:5547:1095
AGGAGATG
+NB500909:174:HTM7GBGX7:1:11101:5547:1095
AAAAAEEE

::::::::::::::
SRR10278808_2.fastq
::::::::::::::
@NB500909:174:HTM7GBGX7:1:11101:5547:1095
AACTCCCGTCNGCTGACTCGTTATGA
+NB500909:174:HTM7GBGX7:1:11101:5547:1095
AAAAAEEEEE#EEEEEAEEEEEEEEE

::::::::::::::
SRR10278808_3.fastq
::::::::::::::
@NB500909:174:HTM7GBGX7:1:11101:5547:1095
ATTTTGAAAAAGAAAAGGCTGGAGATTTCACTGTCATTGATTTTAAGTTTAAAGCACCAGTAATCAAAATTGTATGGTTTAGGCTTAAGGATAGATATATTCATCAGTGANNNNNN
+NB500909:174:HTM7GBGX7:1:11101:5547:1095
AAAAAEEEEEEAAEE/EEEEEEEE//EEEEEAEEEEEEEE/AAEEEEEAEE/<EEEAEEEEEAEEEEEEAEE<E/EE</E<EEE//EEE/</EE/<AAEEEEA/AEEEE<######
ADD COMMENT
0
Entering edit mode

Thank you for this! Didn't thought of trying the other similar function (fastq-dump) instead of fasterq-dump. Will try this later!

Also, do you know how I could possibly split the files into 3 (L1, R1, R2) after downloading? I did a quick search online, and most of them mentioned deinterleaving. I am not sure if this is the right strategy, since most of the tools I've search online splits them into 2 files (Forward and Reverse) by assumingly that each read takes on a fixed number of alternating lines in the file. However, the .fastq files I've already downloaded includes the L1-reads ontop of the forward and reverse reads, and I am not sure if they are suitable for said tools.

ADD REPLY
1
Entering edit mode

If you download the .sra file with prefetch you can simply use the command above on that file. If the file you downloaded as the three reads interleaved it may be difficult to separate them without some custom code.

ADD REPLY

Login before adding your answer.

Traffic: 1025 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6