Hi,
For a single sample, i have several paired-end fastq files from four different flowcells. i.e. fastq files from different lanes from each flowcell. Instead of processing individual fastq files from different flowcells, can i merge all the forward reads(from different flowcells and different lanes) into a single fastq file and all the reverse end reads into another fastq file?
Thanks
well i have fastq files from 8 lanes i.e. 8pairs of forward and reverse reads. IF we map them individually, we will end up with 8 sam files which has to be merged. In this case it becomes so complex with 8 different sam files to be merged. So, would it be wise to concatenate the fastq files and then generate a single sam/bam file?
if time is not problem, concatenate your FASTQs. If you can align the 8 pairs of fastq , convert to BAM and sort 8 jobs in *parallel, then you'll get your result faster.
" it becomes so complex.." : why ? A makefile will solve your problems.
comment from @notSoJunkDNA ( https://twitter.com/notSoJunkDNA/status/365440417212276736 ) "doesn't apply to all pipelines. Tophat for instance needs all the reads..."
could you please elaborate how a makefile will solve the problem? just curious...
with a makefile you can use something $(foreach,FASTQ,1 2 3 4 5 6 7 8, $(eval $(call alignwithbwa ${FASTQ}))) . See http://www.gnu.org/software/make/manual/html_node/Eval-Function.html
Could you please provide a sample make file, which you have been using. Make file might make life easier in case WGS data.
search github: https://gist.github.com/search?l=makefile&q=mpileup
I would also like to mention that my data is paired-end data
why ?
any thing else beside speed and RG
Is processing Lane separately faster than using bwa on the merged fastq file with thread option ? Which one is faster ? Strategy A or B ? I think those strategies are equivalent.
Strategy A : Makefile
Strategy B : merged