Entering edit mode
5.6 years ago
rse
▴
100
Hi,
I have the following fastq files: *_L001_R1.fastq, *_L001_R2.fastq, *_L002_R1.fastq, *_L002_R2.fastq, *_L001_I1.fastq, *_L001_I2.fastq, *_L002_I1.fastq, *_L002_I2.fastq
How do i merge these files?
Thank you
You mean: cat file1 file2 file3 > mergefile ?
Yes, i want to merge these files into paired end files: R1.fastq and R2.fastq so i will merge all R1's and R2's together using separate cat commands. But i am not sure what to do of I1 and I2? Do i just ignore it?
perhaps first explain what the goal of the merging is.
if it is simply to 'reduce' the number of files then yes cat (as suggested by zhangdengwei will do) but that actually makes little 'biological/technical' sense
I want to merge these files into paired end files: R1.fastq and R2.fastq so i will merge all R1's and R2's together using separate cat commands. But i am not sure what to do of I1 and I2? Do i just ignore it?
OK, then
cat
is NOT the correct approach. What you should look for is tools that can create interleaved fastq files starting from separate fastq files. Simply cat-ing them together will not generate valid fastq filesstill don't fully get why you want to merge them though, most programs will expect two files when processing paired-end data anyway (or at best the interleaved format as explained above)
Ok, thank you. Yes, i have 4 files (2 R1 and 2 R2 files from different lanes) so i am merging the 4 files into 2 files.
Does anyone know how to handle the I1 and I2 files? Thank you
Those (the I files I mean) you can omit, they are index files and not needed for typical downstream analysis
if you want to join the two R1 files together and then the two R2 then you could use cat (but make sure you keep the order correct). if you want to join R1 with R2 then you will have to go for interleaved
Ok, understand. Thank you for the help.
lieven.sterck : That is only correct if one has no interest in the index sequences (not sure why one would run these samples as indexed in first place but stuff happens).
It sounds like these samples are not demultiplexed. The index reads are present in separate files. This type of data is generally required for
Qiime
analysis.rse : Are these 16S/metagenomic samples? If so you will need to make use of those
I*
files. If these are not forQiime
analysis are you interested in separating the samples based on the index sequences?I stand corrected.
Indeed, I jumped to conclusion to soon. Of course the index files are useful (and required) for some analyses.