Does fastq dump return interleaved files?
1
0
Entering edit mode
4.9 years ago
O.rka ▴ 740

Does fastq-dump return interleaved files?

For example, fastq-dump ERR315863

Here's the head of my file:

(base) -bash-4.1$ head -n 10 ERR315863.fastq
@ERR315863.1 MERCURE:0070:0:8:1:3672:2101 length=94
AACCCCAACCTCCAAGCCCTCTTCAACGATCCCACCACCCTCAATTGGCATAGGTCAGTTTTTTTTTCGGTGCCGGTGAGGAGGCCTAGCTGGC
+ERR315863.1 MERCURE:0070:0:8:1:3672:2101 length=94
FF=FFCFFFE=@EDDD?DE;EED@E8EDEEEA*ED8><)<<;CDEC=EBC/8CDA.DC58579;C//BDB/CDA<@<:=19722),:;<A@CA@
@ERR315863.2 MERCURE:0070:0:8:1:3733:2124 length=186
AAAAGTGATGATCGCGCTAATTTCTTAAGTAAACTTATGAGAATTACACAAAATGTAAAATTTGACATTTATGGAATTCAAAAGCTAAGATTTAGTCCCATTTTAGAGTTAGATATAGCTTTGAAATAATGATCTGCCCATATTGGTTGAATATTTTGAATTCCATAAATGTCAAGTTTTACATTT
+ERR315863.2 MERCURE:0070:0:8:1:3733:2124 length=186
EBEEEABEBEFCBF@EGBGG@DADDEEFFCE6EE/EED6@D/D/.>*,;>8A4:59>A>?+5>A>:>)8=BDD5D=CBBD.?@IGGGGDEBFDHHBHGG;FGHCAD.C,@BC=DBBFFB6BD?EEBE8/*B@@EEEEEFHHEADFF;DFFBAFD8=B=@..?:8.>:-??@;,75.>=6=-<9DEA
@ERR315863.3 MERCURE:0070:0:8:1:3611:2223 length=155
GAGTGATCCTGGGATACTCAATAAATATGATCTCAGTTCTCATAAAGTAGCTATTCATGCTGCGGCGACCGTGAGGAGGTAAGCACTCATCAGGGGGGCAGGCGGGGAACCCATTGAATTCAGTTCCAGCATAATATTCATGAATAATTGGCCAC

This run is HiSeq 2000 so I'm assuming it's R1/R2 reads but these reads don't look paired: https://www.ebi.ac.uk/metagenomics/runs/ERR315863

What did fastq-dump download with this command and not specifying --split-files?

sequence • 1.8k views
ADD COMMENT
0
Entering edit mode
4.9 years ago
ATpoint 86k

It looks like without split-3 or split-files the two reads are merged together into a single one. That is odd (and utterly useless, not critisizing you but the NCBI folks who developed this tool), so be sure to always use by default the split option. I infact even do it for single-end files to be sure this kind of mess does not happen. This is another argument against fastq-dump as this output makes no logical sense at all. You cannot meaningfully align these data. As said either use split-3 or download from ENA as suggested below.

ADD COMMENT
0
Entering edit mode

Better yet, just download the FASTQs directly and never worry about all the fastq-dump caveats: Fast download of FASTQ files from the European Nucleotide Archive (ENA)

ADD REPLY
0
Entering edit mode

Agreed. I already suggested this in a previously (and related) thread of OP Basic fasterq-dump command is failing from `SIGNAL - Segmentation fault`

ADD REPLY
0
Entering edit mode

Sorry. Didn't see that one. Every time I see another fastq-dump question, I get too excited.

I guess that emphasizes the point even more. Clearly fastq-dump is causing too many unnecessary difficulties.

ADD REPLY
0
Entering edit mode

No problem at all. O.rka sorry I misread your initial question and edited my answer.

ADD REPLY
0
Entering edit mode

Ok, just making sure I’m not crazy because these reads definitely look like a single ended reads but I’m pretty sure this run is paired. Maybe they just concatenate the reads instead of interleaving?

I’ll try the ENA method below. I’ve had issues with aspera ssh keys in the past which is why I’ve avoided using this method but it appears it’s the best way.

ADD REPLY
0
Entering edit mode

Yes, they are merged into single-end reads but as I said above this is non-sense that the tool does it. ENA also offers download via normal ftp. If you enter a dataset the default download paths are via ftp, e.g. using wget.

ADD REPLY

Login before adding your answer.

Traffic: 2728 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6