Entering edit mode
3.2 years ago
nicole.kavanagh
•
0
Hi there,
I was just wondering if anyone could offer any advice on splitting two merged fastq files (R1 and R2) into one per-sample fastq files? I've downloaded several biosamples from SRA via ftp, but they are merged into one file and I am unsure how to split them. Thanks!
How can one distinguish the two samples? Do they have individual indexes?Reading your post again it seems like you are asking about
interleaved data
files where each R1 and R2 reads are present next to each other e.g.R1_1,R2_1,R1_2,R2_2,R1_3,R2_3
etc. If that is the case you can separate the interleaved reads usingreformat.sh
from BBMap suite.If the two read files are "merged" by copying them end to end (
R1_file_followed_by_R2_file
) then you may need to usesplit
by counting number of records (*4 lines).Thanks so much for your advice. Apologies, I wasn't very clear in my question. I have two files, the first file contains the R1_1, R1_2, R1_3, R1_4..etc and the second file contains R2_1, R2_2, R2_3, R2_4, so I don't think they fall into interleaved data or merged files. How would I go about splitting them into individual files? An example of the text contained within the fastq is as follows:
Thanks for your help!
There is no need to split anything. It looks like you have standard paired-end files (2 files per) for one sample. You should have 1 pair for
SRR3138122
, another pair forSRR####
(next accession) etc.What you're trying to do is demultiplex your paired-end reads. As GenoMax asked, the method is going to depend on how the indexes/barcodes are listed in the file. There have been a number of other questions on this here, so one of them might be helpful: How to split a fastq file into each corresponding sample.fastq?, Demultiplexing fastq.gz files, Split fastq according to barcodes