Question

When to merge sequencing data from multiple lanes (FastQToSam, SamToFastq, BWA, MergeBamFiles, or additional step)?

1

Entering edit mode

3.5 years ago

Jordi ▴ 60

Hi,

I am following GATK's Best Practice Workflow for germline short variants discovery in single samples. The pipeline is composed of the following steps:

FastqToSam
MarkIlluminaAdapters
FastqToSam
bwa mem
MergeBamAlignment
MarkDuplicates
BaseRecalibrator
ApplyBQRS
ValidateSamFile
HaplotypeCaller

I work with paired-end sequencing data, and mostly each sample has one forward read FASTQ file, and one reverse read FASTQ file. However, I have a couple of samples for which sequencing data is divided onto multiple lanes. I have seen that most commands in the pipeline do not allow for multiple lanes input, but only one forward and one reverse (or one unmapped bam and one aligned bam for MergeBamAlignment). Should I merge all forward and all reverse FASTQ files before starting the pipeline (quality of each dataset seems comparable according to FastQC/multiQC) or only later (and, if so, which step would be the best)?

Thanks for your input.

alignment pipeline ngs picard sequencing • 2.6k views

ADD COMMENT • link updated 3.5 years ago by lieven.sterck 15k • written 3.5 years ago by Jordi ▴ 60

score 0 · Answer 1 · 2021-07-25

0

Entering edit mode

3.5 years ago

lieven.sterck 15k

You need for sure to keep your samples separate.

Merging data from the same sample that has been sequenced/run in multiple lanes you can simply cat them together , forward with forward and reverse with reverse (in the same order!). In the end you will then have 1 file for forward and one for reverse per sample.

ADD COMMENT • link 3.5 years ago by lieven.sterck 15k

1

Entering edit mode

Hi lieven.sterck ,

thanks for the reply. Of course, I will keep samples separated. I was just wondering what best practices would dictate as to when (what step of the pipeline) to merge the various lanes results.

ADD REPLY • link 3.5 years ago by Jordi ▴ 60

1

Entering edit mode

I would say step 1. Personally I join/merge all lanes of a sample into a single file before I do anything with them. (those different lanes sequencing is just a technical thing of the sequencing so you can harmlessly cat them together. Keep the files in sync though!!)

ADD REPLY • link 3.5 years ago by lieven.sterck 15k