Merge fastq reads for several samples
1
0
Entering edit mode
7.0 years ago
rim.klabi • 0

Hello

I have 400 fastq files from different samples in two sequencing runs. Both runs were on Illumina Hiseq. How i can merge the .fastq files of both runs for each sample, and in one step..For sure we have to keep R1 and R2 separate ..I know that we can just merge the .fastq files of both runs using cat..but i have to use this only for one sampleā€¦and than i have to repeat this many times for all the samples..and i have more than 100 samples... What command do i use ?? any folder to prepare?

Thank you for helping me

next-gen • 6.3k views
ADD COMMENT
0
Entering edit mode

You'll need a for loop for this, and to figure out how to write this command we need to know how your files are named, which naming pattern you use to distinguish the samples/lanes/read direction.

ADD REPLY
0
Entering edit mode

Why are you merging reads again?

ADD REPLY
0
Entering edit mode

OP is not merging the reads but merging file pieces for a sample. bcl2fastq used to break files up in 2 million read chunks in past.

ADD REPLY
0
Entering edit mode

maybe im just being too literal here, but he seems to be saying that he's merging reads from two independent runs. and if thats the case, then he probably should be treating each run as a technical replicate, or consider merging after correcting for batch effects.

ADD REPLY
0
Entering edit mode

I have to merge reads from two runs in order to increase the reads number ..

ADD REPLY
0
Entering edit mode

1, prepare two folders, and the same sample in two runs should share the same file name. And create an combined folder.
2, loop the folder of first run, for each file name you get, do 3
3, do cat run1/samplename.R1.fq run2/samplename.R1.fq > combined/samplename.R1.fq and cat run1/samplename.R2.fq run2/samplename.R2.fq > combined/samplename.R2.fq

ADD REPLY
0
Entering edit mode
7.0 years ago

This is a little perl script I use. It works off the assumption that everything is in the same folder, and everything before the 'S\d+' is the name. It will cat together everything with the same name

foreach my $sample (@dir) {
                        next unless $sample =~ /.gz/;
                my ($shortname) = $sample =~ /(\S+)_S\d+_L\d+_R\d_\d\d\d.fastq.gz/;
                $hash{$shortname}++;
}
foreach my $key(keys(%hash)) {
                mkdir $key;
                my $temp = $dir . "/" . "$key" . "*.gz";
                system("cd $key;  cat $temp | STAR --genomeDir $genomeDir --sjdbGTFfile $gtf --readFilesIn - --readFilesCommand zcat  --quantMode TranscriptomeSAM GeneCounts --outSAMunmapped Within  --outSAMtype BAM SortedByCoordinate -- runThreadN 20 --limitBAMsortRAM 1001279989; cd ..;");
}
ADD COMMENT
0
Entering edit mode

Since OP has not said anything about alignment it may be good to remove the STAR command line from the loop above.

ADD REPLY
0
Entering edit mode

True, but he can plug in whatever applications he intends to do in its place.

ADD REPLY
0
Entering edit mode

Thank you i will try ..

ADD REPLY

Login before adding your answer.

Traffic: 1672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6