Hi all!
I'm trying to download the following dataset (PRJNA577618) from NCBI using the sratoolkit, specifically the fasterq-dump. I note that from the individual file metadata (for example), there should be 3 output files corresponding to the L1, R1, and R2. However, regardless of the setting I've used (ie: I've tried explicitly using --split-3 with fasterq-dump althought it is the default), I am only getting one .fastq file.
Anyone has any idea why this might be so? Alternatively, anyone has any solution on how I could split the files after downloading them?
Thanks in advance!
Thank you for this! Didn't thought of trying the other similar function (fastq-dump) instead of fasterq-dump. Will try this later!
Also, do you know how I could possibly split the files into 3 (L1, R1, R2) after downloading? I did a quick search online, and most of them mentioned deinterleaving. I am not sure if this is the right strategy, since most of the tools I've search online splits them into 2 files (Forward and Reverse) by assumingly that each read takes on a fixed number of alternating lines in the file. However, the .fastq files I've already downloaded includes the L1-reads ontop of the forward and reverse reads, and I am not sure if they are suitable for said tools.
If you download the
.sra
file withprefetch
you can simply use the command above on that file. If the file you downloaded as the three reads interleaved it may be difficult to separate them without some custom code.