Hi!!
I have to analyze paired-end RNA-seq read that are in an unusual format: both pair-end reads are joined in one FASTQ. I need to split the file in two separated FASTQ paried-end files.
There are a galaxy tool named FASTQ splitter that can do this:
FASTQ splitter
What it does
Splits a single fastq dataset representing paired-end run into two datasets (one for each end). This tool works only for datasets where both ends have the same length.
Sequence identifiers will have /1 or /2 appended for the split left-hand and right-hand reads, respectively.
Input format
A multiple-fastq file, for example:
@HWI-EAS91_1_30788AAXX:7:21:1542:1758
GTCAATTGTACTGGTCAATACTAAAAGAATAGGATCGCTCCTAGCATCTGGAGTCTCTATCACCTGAGCCCA
+HWI-EAS91_1_30788AAXX:7:21:1542:1758
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh`hfhhVZSWehR
Outputs
Left-hand Read:
@HWI-EAS91_1_30788AAXX:7:21:1542:1758/1
GTCAATTGTACTGGTCAATACTAAAAGAATAGGATC
+HWI-EAS91_1_30788AAXX:7:21:1542:1758/1
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Right-hand Read:
@HWI-EAS91_1_30788AAXX:7:21:1542:1758/2
GCTCCTAGCATCTGGAGTCTCTATCACCTGAGCCCA
+HWI-EAS91_1_30788AAXX:7:21:1542:1758/2
hhhhhhhhhhhhhhhhhhhhhhhh`hfhhVZSWehR
Do you know any other standard alone script that can do this job?
Just as a side note, this output is what you get when you use 'fastq-dump' of the SRA tools to download paired-end data from the SRA archive. In that case, there is a nice option: 'fastq-dump --split-files' which will output one file for all the /1 pairs and another file for the /2 pairs.
I tried this command on interleaved fq containing paired reads from both ends, but it did not work even if I changed .fq to .sra. Should it have worked?
Hi trakhtenberg,
Hopefully you were able to figure this out given how long ago this was, but for posterity of others looking for answers, the fastq-dump does not act on local files. It instead downloads files from SRA and converts them into fastq files for you. If the SRA accession number (usually SRR#######) stores paired-end reads, you should use the following command:
More information about fastq-dump and other SRA toolkit utilities can be found here: http://www.ncbi.nlm.nih.gov/books/NBK158900/
(edit to make code clearer)