Does fastq dump return interleaved files?

Does fastq dump return interleaved files?

0

Entering edit mode

5.6 years ago

O.rka ▴ 750

Does fastq-dump return interleaved files?

For example, fastq-dump ERR315863

Here's the head of my file:

(base) -bash-4.1$ head -n 10 ERR315863.fastq
@ERR315863.1 MERCURE:0070:0:8:1:3672:2101 length=94
AACCCCAACCTCCAAGCCCTCTTCAACGATCCCACCACCCTCAATTGGCATAGGTCAGTTTTTTTTTCGGTGCCGGTGAGGAGGCCTAGCTGGC
+ERR315863.1 MERCURE:0070:0:8:1:3672:2101 length=94
FF=FFCFFFE=@EDDD?DE;EED@E8EDEEEA*ED8><)<<;CDEC=EBC/8CDA.DC58579;C//BDB/CDA<@<:=19722),:;<A@CA@
@ERR315863.2 MERCURE:0070:0:8:1:3733:2124 length=186
AAAAGTGATGATCGCGCTAATTTCTTAAGTAAACTTATGAGAATTACACAAAATGTAAAATTTGACATTTATGGAATTCAAAAGCTAAGATTTAGTCCCATTTTAGAGTTAGATATAGCTTTGAAATAATGATCTGCCCATATTGGTTGAATATTTTGAATTCCATAAATGTCAAGTTTTACATTT
+ERR315863.2 MERCURE:0070:0:8:1:3733:2124 length=186
EBEEEABEBEFCBF@EGBGG@DADDEEFFCE6EE/EED6@D/D/.>*,;>8A4:59>A>?+5>A>:>)8=BDD5D=CBBD.?@IGGGGDEBFDHHBHGG;FGHCAD.C,@BC=DBBFFB6BD?EEBE8/*B@@EEEEEFHHEADFF;DFFBAFD8=B=@..?:8.>:-??@;,75.>=6=-<9DEA
@ERR315863.3 MERCURE:0070:0:8:1:3611:2223 length=155
GAGTGATCCTGGGATACTCAATAAATATGATCTCAGTTCTCATAAAGTAGCTATTCATGCTGCGGCGACCGTGAGGAGGTAAGCACTCATCAGGGGGGCAGGCGGGGAACCCATTGAATTCAGTTCCAGCATAATATTCATGAATAATTGGCCAC

This run is HiSeq 2000 so I'm assuming it's R1/R2 reads but these reads don't look paired: https://www.ebi.ac.uk/metagenomics/runs/ERR315863

What did fastq-dump download with this command and not specifying --split-files?

sequence • 2.1k views

ADD COMMENT • link updated 5.6 years ago by ATpoint 89k • written 5.6 years ago by O.rka ▴ 750

0

Entering edit mode

5.6 years ago

ATpoint 89k

It looks like without split-3 or split-files the two reads are merged together into a single one. That is odd (and utterly useless, not critisizing you but the NCBI folks who developed this tool), so be sure to always use by default the split option. I infact even do it for single-end files to be sure this kind of mess does not happen. This is another argument against fastq-dump as this output makes no logical sense at all. You cannot meaningfully align these data. As said either use split-3 or download from ENA as suggested below.

ADD COMMENT • link 5.6 years ago by ATpoint 89k

0

Entering edit mode

Better yet, just download the FASTQs directly and never worry about all the fastq-dump caveats: Fast download of FASTQ files from the European Nucleotide Archive (ENA)

ADD REPLY • link 5.6 years ago by igor 13k

0

Entering edit mode

Agreed. I already suggested this in a previously (and related) thread of OP Basic fasterq-dump command is failing from `SIGNAL - Segmentation fault`

ADD REPLY • link 5.6 years ago by ATpoint 89k

0

Entering edit mode

Sorry. Didn't see that one. Every time I see another fastq-dump question, I get too excited.

I guess that emphasizes the point even more. Clearly fastq-dump is causing too many unnecessary difficulties.

ADD REPLY • link 5.6 years ago by igor 13k

0

Entering edit mode

No problem at all. O.rka sorry I misread your initial question and edited my answer.

ADD REPLY • link 5.6 years ago by ATpoint 89k

0

Entering edit mode

Ok, just making sure I’m not crazy because these reads definitely look like a single ended reads but I’m pretty sure this run is paired. Maybe they just concatenate the reads instead of interleaving?

I’ll try the ENA method below. I’ve had issues with aspera ssh keys in the past which is why I’ve avoided using this method but it appears it’s the best way.

ADD REPLY • link 5.6 years ago by O.rka ▴ 750

0

Entering edit mode

Yes, they are merged into single-end reads but as I said above this is non-sense that the tool does it. ENA also offers download via normal ftp. If you enter a dataset the default download paths are via ftp, e.g. using wget.

ADD REPLY • link 5.6 years ago by ATpoint 89k

Login before adding your answer.

Similar Posts

Loading Similar Posts

Traffic: 2065 users visited in the last hour

Content Search
Users
Tags
Badges

Help About
FAQ

Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the

version 2.3.6