Splitting fastq files
0
0
Entering edit mode
3.2 years ago

Hi there,

I was just wondering if anyone could offer any advice on splitting two merged fastq files (R1 and R2) into one per-sample fastq files? I've downloaded several biosamples from SRA via ftp, but they are merged into one file and I am unsure how to split them. Thanks!

fastq ncbi sra • 1.5k views
ADD COMMENT
0
Entering edit mode

How can one distinguish the two samples? Do they have individual indexes?

Reading your post again it seems like you are asking about interleaved data files where each R1 and R2 reads are present next to each other e.g. R1_1,R2_1,R1_2,R2_2,R1_3,R2_3 etc. If that is the case you can separate the interleaved reads using reformat.sh from BBMap suite.

reformat.sh in=interleaved.fq out1=R1.fq out2=R2.fq

If the two read files are "merged" by copying them end to end (R1_file_followed_by_R2_file) then you may need to use split by counting number of records (*4 lines).

ADD REPLY
0
Entering edit mode

Thanks so much for your advice. Apologies, I wasn't very clear in my question. I have two files, the first file contains the R1_1, R1_2, R1_3, R1_4..etc and the second file contains R2_1, R2_2, R2_3, R2_4, so I don't think they fall into interleaved data or merged files. How would I go about splitting them into individual files? An example of the text contained within the fastq is as follows:

@SRR3138122.104397 104397/1
CAGCATGAGTGGCTATATTCATATATTCATTGACTCGATCTTCATTACTGTCAATAGCCAGTACTTCTTGTCCAGCTTCGATCAGTGTACGGCAGATGCTTCCGCCAAATCGTCCAAGGC
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGFGGGGGGGGGGGGGGGGGGGGGGGCFGGGGGGGGDFGGGGFGGGGGGGGGGGGGGGGGGGGGGG
@SRR3138122.104398 104398/1
TAGACATGCCGACAAAGAACCCTAAAACACGTAAGAAACCAAAAGTCAAAATCGGTGACATGGTTCGCTGCGAAGCAGAAGGGTTCATCTATCCGTTTCGTGGATATGTAGAACATGTTTATGATCACTCAGCAATCATTCGCATTGAAAACACGATGGAATGCGACAAGTGGTTAGCGAAAAGCAAAGATAATTTAGCAGTGGCTCGATTGGTGGATATGGAACTAATC
+
A6A@@FFFGFFGGGGFEGGFFEFGFGGGGGGGGGGGGFGGGGGGGGDFGGGF@EEG:FGGGF<CE@F@FGGGGGGGGCFDGGC<EFFGGGGGFGDFGGGGGFGGFFGGGGFAFFFFFFFFGCFFGFGGGGGCFF,5AFGDFG>FEGGGFGGEGGGGCGGGGGGGGGGGGGEDDEC=<FGGGGGA<CFFGGGGGGGEGGGGDBECE@7CECCF,?EEFCGGG9BECE9FCB
@SRR3138122.104399 104399/1
CTTTGTGGCAAAAGCGGAGAATCGCCATTGCTCCCTTGAAAACGTCTTTCCTGATTCCATTGTGTCTTCCCATGTCGGGTAAAATATAGTTCCATACGCAGCAAACTCCTGTCTTTCTTTTATTTTACTCTATAATAAACTGTCTAATCCAGCTTGTCATTTTCATAGTGATGCGTCAATTCTAAATTAGGGAATCCTTGCTGGCGAAGTGCTTCATACACTATCATTGCAGCTGTATTCGAT
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGFFGGGGGGGGGGGGGGGGFGFGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGEGCFEFCFGGCGCGGGGGGGGGGGGGFGGGGGGGGGGGGGCFGGGGGGGGGGGGGGGGFGGG@FGFGGGGGGGGGGGGGEC>CCEFGFGGGFGGGFGGGGGFFGGGGGCACDE62,
@SRR3138122.104400 104400/1
CGTTGTTTCTGCTTTCTTGGAAAGACACCCTGATTTTGAGAGGATTGATGT
+
CCCCCGGGFGFGFGGAFG<FCFGGFGGGGG8FDFGGGD@FECF@FG?@FFF
@SRR3138122.104401 104401/1
GCTTTGAAAAATATTATTGCTATTGGTGCTGGCGCTATTCACGGCTTAGGGTTTGGCGATAATGCAAAGGCAGCAATCATGACGCGCGGATTAGCTGAGATCAGCCGTTTAGGTGTAGCAATGGATGCCAACCCGCTGACCTTTATCGGATTAAGCGGAGTAGGCGATCTAGTCGTTACTTGCACAAGTGTTCATTCTAGGAACTGGCGCGCAGGCAAATTGTTAGGACAAGGGCAGCCCCTAGAAG
+
A-6CCFGGF9AFEFGE9F9C,<<,<CEF,C,CFGGGFDF9@FFGGG+FC,CFGGC7@7CBFEGGCFC<AFCDEEEFCFFGFAGG@:F:E+CEEGGFGGGFF<F9FGGG+BFB8<=?BFF<FFC8<EF,?E=<+BC>:B@@8F<FB9AB7++33=>,3+5CFCACCF@*>:,>FFFFD:*;FCEBAFF7,?FCGFC;;;@,,6;<EC8C:838*3:55**<+2?;9>6++=88/;CE*;*;*:8:0<F

Thanks for your help!

ADD REPLY
0
Entering edit mode

There is no need to split anything. It looks like you have standard paired-end files (2 files per) for one sample. You should have 1 pair for SRR3138122 , another pair for SRR#### (next accession) etc.

ADD REPLY
0
Entering edit mode

What you're trying to do is demultiplex your paired-end reads. As GenoMax asked, the method is going to depend on how the indexes/barcodes are listed in the file. There have been a number of other questions on this here, so one of them might be helpful: How to split a fastq file into each corresponding sample.fastq?, Demultiplexing fastq.gz files, Split fastq according to barcodes

ADD REPLY

Login before adding your answer.

Traffic: 1986 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6