Question

Merge fastq reads for several samples

0

Entering edit mode

7.5 years ago

rim.klabi • 0

Hello

I have 400 fastq files from different samples in two sequencing runs. Both runs were on Illumina Hiseq. How i can merge the .fastq files of both runs for each sample, and in one step..For sure we have to keep R1 and R2 separate ..I know that we can just merge the .fastq files of both runs using cat..but i have to use this only for one sample…and than i have to repeat this many times for all the samples..and i have more than 100 samples... What command do i use ?? any folder to prepare?

Thank you for helping me

next-gen • 6.7k views

ADD COMMENT • link updated 7.5 years ago by swbarnes2 15k • written 7.5 years ago by rim.klabi • 0

0

Entering edit mode

You'll need a for loop for this, and to figure out how to write this command we need to know how your files are named, which naming pattern you use to distinguish the samples/lanes/read direction.

ADD REPLY • link 7.5 years ago by WouterDeCoster 48k

0

Entering edit mode

Why are you merging reads again?

ADD REPLY • link 7.5 years ago by mforde84 ★ 1.4k

0

Entering edit mode

OP is not merging the reads but merging file pieces for a sample. bcl2fastq used to break files up in 2 million read chunks in past.

ADD REPLY • link 7.5 years ago by GenoMax 151k

0

Entering edit mode

maybe im just being too literal here, but he seems to be saying that he's merging reads from two independent runs. and if thats the case, then he probably should be treating each run as a technical replicate, or consider merging after correcting for batch effects.

ADD REPLY • link 7.5 years ago by mforde84 ★ 1.4k

0

Entering edit mode

I have to merge reads from two runs in order to increase the reads number ..

ADD REPLY • link 7.5 years ago by rim.klabi • 0

0

Entering edit mode

1, prepare two folders, and the same sample in two runs should share the same file name. And create an combined folder.
2, loop the folder of first run, for each file name you get, do 3
3, do cat run1/samplename.R1.fq run2/samplename.R1.fq > combined/samplename.R1.fq and cat run1/samplename.R2.fq run2/samplename.R2.fq > combined/samplename.R2.fq

ADD REPLY • link 7.5 years ago by chen ★ 2.5k

score 0 · Answer 1 · 2017-12-18

0

Entering edit mode

7.5 years ago

swbarnes2 15k

This is a little perl script I use. It works off the assumption that everything is in the same folder, and everything before the 'S\d+' is the name. It will cat together everything with the same name

foreach my $sample (@dir) {
                        next unless $sample =~ /.gz/;
                my ($shortname) = $sample =~ /(\S+)_S\d+_L\d+_R\d_\d\d\d.fastq.gz/;
                $hash{$shortname}++;
}
foreach my $key(keys(%hash)) {
                mkdir $key;
                my $temp = $dir . "/" . "$key" . "*.gz";
                system("cd $key;  cat $temp | STAR --genomeDir $genomeDir --sjdbGTFfile $gtf --readFilesIn - --readFilesCommand zcat  --quantMode TranscriptomeSAM GeneCounts --outSAMunmapped Within  --outSAMtype BAM SortedByCoordinate -- runThreadN 20 --limitBAMsortRAM 1001279989; cd ..;");
}

ADD COMMENT • link 7.5 years ago by swbarnes2 15k

0

Entering edit mode

Since OP has not said anything about alignment it may be good to remove the STAR command line from the loop above.

ADD REPLY • link 7.5 years ago by GenoMax 151k

0

Entering edit mode

True, but he can plug in whatever applications he intends to do in its place.

ADD REPLY • link 7.5 years ago by swbarnes2 15k

0

Entering edit mode

Thank you i will try ..

ADD REPLY • link 7.5 years ago by rim.klabi • 0