How do I split a combined fastq file downloaded from SRA into separate _1.fq and _2.fq read pairs?
3
2
Entering edit mode
9.1 years ago
simonH ▴ 20

I've downloaded a fastq file from SRA (http://trace.ncbi.nlm.nih.gov/Traces/sra/) containing reads from a paired-end Illumina 101 bp RNAseq experiment. The only problem is, it contains both read pairs in a single file, whereas I need separate files with all the _1.fq reads in one and the _2.fq reads in another.

Can anybody help? I'm aware of the fastq-dump tool within the SRA Toolkit, but I couldn't get it to work when I was originally downloading the data.

Many thanks in advance.

My fastq file looks like this:

$ head sra_data.fastq
@SRR1659960.1.1 1 length=101
NAGAAATGAATGAGCCTACAGATGATAGGATGTTTCATGTGGTGTATGCATCGGGGTAGTCCGAGTAACGTCGGGGCATTCCGGATAGGCCGAGAAAGTGT
+SRR1659960.1.1 1 length=101
#1=BDDDDDHFBFIEHHHHAG<HE@HGGE@HHFGHGGHHFHIHG@FFGGGHIIIIIFAC=F@GEGEECCDCECCBBBBCCCD>9599>C:@>5@9>?CCCD
@SRR1659960.1.2 1 length=101
CCCACTTCCACTATGTCCTATCAATAGGAGCTGTATTTGCCATCATAGGAGGCTTCATTCACTGATTTCCCCTATTCTCAGGCTACACCCTAGACCAAACC
+SRR1659960.1.2 1 length=101
<7?BD?DD<DFFABBEHEEFHII>C:BCDD?<C?FFC4E>@DEF>?FGHDFBBCG8??DGGIII:BF@C=FFC;C=D;@?EA76?DDBEC?>>ACCCABBB
@SRR1659960.2.1 2 length=101
NATAAAGTGTATGACAAATATACAAGGCTCCTAATATTGGTTTAACTTGGAGAAGTAGGTAAAGGAAGAAGGGNAAAGGAAATAGACAAAAAGACTACAGT

sequence RNA-Seq • 4.1k views
ADD COMMENT
4
Entering edit mode
9.1 years ago

Use Reformat from the BBMap package:

reformat.sh in=sra_data.fastq out1=r1.fq out2=r2.fq interleaved
ADD COMMENT
0
Entering edit mode

Sorry, I missed this earlier. Thanks! I'm downloading BBMap now and will report back

ADD REPLY
0
Entering edit mode

Seems to have worked a charm. Thanks!

ADD REPLY
2
Entering edit mode
9.1 years ago
h.mon 35k
fastq-dump --split-files
ADD COMMENT
0
Entering edit mode

I've tried this, but I get the following error message:

$ sratoolkit.2.5.4-1-centos_linux64/bin/fastq-dump --split-files sra_data.fastq
2015-10-15T22:59:03 fastq-dump.2.5.4 err: item not found while constructing within virtual database module - the path 'sra_data.fastq' cannot be opened as database or table
ADD REPLY
1
Entering edit mode
9.1 years ago
pevsner ▴ 420

Try this:

fastq-dump --split-files SRR1659960

For a description of the --split-files argument try:

fastq-dump --help

You can track the progress of the download by checking the file sizes in your directory:

ls -lh SRR1659960_*​
ADD COMMENT
0
Entering edit mode

Thanks, I'm doing this now. I was hoping to find a way that didn't involve re-downloading the whole dataset (it's 49 gb in compressed form), but it looks like I'll have to. Cheers

ADD REPLY

Login before adding your answer.

Traffic: 1954 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6