Hi guys, I have a total of 32 samples from RNAseq, paired end (Illumina). For each sample I have 4 different fastq files for 4 different lanes (and forward and reverse). So in total I have 4 forwards and 4 reverse fastq files for each sample. I was wondering if it could be possible and recommendable to merge the 4 fastq files for each forward and reverse and do the QC analysis with fastqc. Or is better to trimming each fastq file independently and then merge?
Many thanks in advance!
Best
If you already have the files in pieces you could brute force parallelize trimming/alignments etc and then merge the BAM files at the end (before sorting/indexing) but otherwise you can cat the R1 and R2 files (in the same order!) to generate single larger files per sample.
By lines, do you mean cell lines? Or are those replicates for each sample? Your experimental setup isn't very clear here. Generally, I'd be against merging replicates, especially if you're trying to find differentially expressed genes between your various sample conditions - most programs use replicates as a way of drastically increase the statistical power behind such analyses.
My guess is that lines should be lanes, as in sequencing lanes.
In that case, merging is fine.
Yes my mistake!! They are lanes (edited in my previous post). Thank you very much :)
Oh, that makes much more sense. Yes, I'd agree with WouterDeCoster than, merging the F+R FastQs before QC should be fine.
I didn't mean merge F+R, I meant merge F+F+F+F and R+R+R+R and do the QC in the new F and new R and then sort and merge the F+R
By merge you mean concatenating technical replicates from same sample? I would argue you should perform QC with files separately, to check for possible batch effects, and merge only after being sure no sizable batch effects are present.
Or by "merge" you mean merge R1+R2 with a program like BBMerge, FLASH or PEAR?
With merge I mean
cat *R1_.fastq > big_R1.fastq
and cat*R2_.fastq > big_R2.fastq
not merge forward and reverse in that step.