Fastq Files From Different Flowcells
2
5
Entering edit mode
11.3 years ago
hellbio ▴ 520

Hi,

For a single sample, i have several paired-end fastq files from four different flowcells. i.e. fastq files from different lanes from each flowcell. Instead of processing individual fastq files from different flowcells, can i merge all the forward reads(from different flowcells and different lanes) into a single fastq file and all the reverse end reads into another fastq file?

Thanks

fastq • 19k views
ADD COMMENT
5
Entering edit mode
11.3 years ago

Yes, you can (see BruceyB's answer) but that's usually a bad idea.

You can process the fastqs in parallel using , for example make with the option -j (number of parallel tasks), and merge the SAM files later.

enter image description here

ADD COMMENT
0
Entering edit mode

well i have fastq files from 8 lanes i.e. 8pairs of forward and reverse reads. IF we map them individually, we will end up with 8 sam files which has to be merged. In this case it becomes so complex with 8 different sam files to be merged. So, would it be wise to concatenate the fastq files and then generate a single sam/bam file?

ADD REPLY
0
Entering edit mode

if time is not problem, concatenate your FASTQs. If you can align the 8 pairs of fastq , convert to BAM and sort 8 jobs in *parallel, then you'll get your result faster.

" it becomes so complex.." : why ? A makefile will solve your problems.

ADD REPLY
0
Entering edit mode

comment from @notSoJunkDNA ( https://twitter.com/notSoJunkDNA/status/365440417212276736 ) "doesn't apply to all pipelines. Tophat for instance needs all the reads..."

ADD REPLY
0
Entering edit mode

could you please elaborate how a makefile will solve the problem? just curious...

ADD REPLY
0
Entering edit mode

with a makefile you can use something $(foreach,FASTQ,1 2 3 4 5 6 7 8, $(eval $(call alignwithbwa ${FASTQ}))) . See http://www.gnu.org/software/make/manual/html_node/Eval-Function.html

ADD REPLY
0
Entering edit mode

Could you please provide a sample make file, which you have been using. Make file might make life easier in case WGS data.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I would also like to mention that my data is paired-end data

ADD REPLY
0
Entering edit mode

why ?

that's usually a bad idea.

any thing else beside speed and RG

ADD REPLY
0
Entering edit mode

Is processing Lane separately faster than using bwa on the merged fastq file with thread option ? Which one is faster ? Strategy A or B ? I think those strategies are equivalent.

Strategy A : Makefile

    bwa lane1.fastq
    bwa lane2.fastq 
    bwa lane3.fastq
    bwa lane4.fastq

Strategy B : merged

  bwa all.lane.fastq -t 4
ADD REPLY
3
Entering edit mode
11.3 years ago
BruceB ▴ 340

Yes, you can. The simplest way of doing this is with 'cat' on the terminal. This will concatenate the files you choose into one FQ file. E.g. cat R1_001.fq.gz R1_002.fq.gz ... R1_n.fq.gz > R1_combined.fq.gz

ADD COMMENT
0
Entering edit mode

So it can be done by concatenating all the forward reads to 1_fastq.gz and reverse reads to 2_fastq.gz and then mapping the paired-end files to a single bam file.

ADD REPLY
0
Entering edit mode

Yes, that is exactly what I would do (and have done in the recent past). Once concatenated, you would never know they came from different lanes.

ADD REPLY
0
Entering edit mode

Not exactly, lane number is also represented in the sequence identifier, see http://support.illumina.com/help/SequencingAnalysisWorkflow/Content/Vault/Informatics/Sequencing_Analysis/CASAVA/swSEQ_mCA_FASTQFiles.htm

Each entry in a FASTQ file consists of four lines:
• Sequence identifier
• Sequence
• Quality score identifier line (consisting of a +)
• Quality score

Each sequence identifier, the line that precedes the sequence and describes it, needs to be in the following format:

@<instrument>:<run number="">:<flowcell id="">:<lane>:<tile>:<x-pos>:<y-pos> <read>:<is filtered="">:<control number="">:<index sequence="">

ADD REPLY

Login before adding your answer.

Traffic: 2362 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6