Does fastq-dump return interleaved files?
For example,
fastq-dump ERR315863
Here's the head of my file:
(base) -bash-4.1$ head -n 10 ERR315863.fastq
@ERR315863.1 MERCURE:0070:0:8:1:3672:2101 length=94
AACCCCAACCTCCAAGCCCTCTTCAACGATCCCACCACCCTCAATTGGCATAGGTCAGTTTTTTTTTCGGTGCCGGTGAGGAGGCCTAGCTGGC
+ERR315863.1 MERCURE:0070:0:8:1:3672:2101 length=94
FF=FFCFFFE=@EDDD?DE;EED@E8EDEEEA*ED8><)<<;CDEC=EBC/8CDA.DC58579;C//BDB/CDA<@<:=19722),:;<A@CA@
@ERR315863.2 MERCURE:0070:0:8:1:3733:2124 length=186
AAAAGTGATGATCGCGCTAATTTCTTAAGTAAACTTATGAGAATTACACAAAATGTAAAATTTGACATTTATGGAATTCAAAAGCTAAGATTTAGTCCCATTTTAGAGTTAGATATAGCTTTGAAATAATGATCTGCCCATATTGGTTGAATATTTTGAATTCCATAAATGTCAAGTTTTACATTT
+ERR315863.2 MERCURE:0070:0:8:1:3733:2124 length=186
EBEEEABEBEFCBF@EGBGG@DADDEEFFCE6EE/EED6@D/D/.>*,;>8A4:59>A>?+5>A>:>)8=BDD5D=CBBD.?@IGGGGDEBFDHHBHGG;FGHCAD.C,@BC=DBBFFB6BD?EEBE8/*B@@EEEEEFHHEADFF;DFFBAFD8=B=@..?:8.>:-??@;,75.>=6=-<9DEA
@ERR315863.3 MERCURE:0070:0:8:1:3611:2223 length=155
GAGTGATCCTGGGATACTCAATAAATATGATCTCAGTTCTCATAAAGTAGCTATTCATGCTGCGGCGACCGTGAGGAGGTAAGCACTCATCAGGGGGGCAGGCGGGGAACCCATTGAATTCAGTTCCAGCATAATATTCATGAATAATTGGCCAC
This run is HiSeq 2000
so I'm assuming it's R1/R2 reads but these reads don't look paired:
https://www.ebi.ac.uk/metagenomics/runs/ERR315863
What did fastq-dump
download with this command and not specifying --split-files?
Better yet, just download the FASTQs directly and never worry about all the
fastq-dump
caveats: Fast download of FASTQ files from the European Nucleotide Archive (ENA)Agreed. I already suggested this in a previously (and related) thread of OP Basic fasterq-dump command is failing from `SIGNAL - Segmentation fault`
Sorry. Didn't see that one. Every time I see another
fastq-dump
question, I get too excited.I guess that emphasizes the point even more. Clearly
fastq-dump
is causing too many unnecessary difficulties.No problem at all. O.rka sorry I misread your initial question and edited my answer.
Ok, just making sure I’m not crazy because these reads definitely look like a single ended reads but I’m pretty sure this run is paired. Maybe they just concatenate the reads instead of interleaving?
I’ll try the ENA method below. I’ve had issues with aspera ssh keys in the past which is why I’ve avoided using this method but it appears it’s the best way.
Yes, they are merged into single-end reads but as I said above this is non-sense that the tool does it. ENA also offers download via normal ftp. If you enter a dataset the default download paths are via ftp, e.g. using wget.