Hello everyone,
I currently use a workflow for NGS without UMIs that looks like this upstream:
a. Fastqs
b. Remove sequence adapters
c. Alignment using BWA MEM
d. Index bam
e. Picard MarkDuplicates to remove duplicate reads
f. Recalibrate bam
g. CollectHSMetrics
h. Variant calling tools downstream.
However, now I am trying to modify the above pipeline to include UMI and the workflow suggested to me was:
- Fastqs to unaligned bams
- Extract UMI bases as an unaligned BAM tag
- Unaligned BAM to Fastq
- Alignment using BWA MEM
- Merge aligned BAM with unaligned BAM that contains UMI tags
- Group reads by UMI
- Call consensus reads
- Align duplex consensus reads (unaligned consensus reads to Fastq & alignment using BWA MEM)
- CollectHSMetrics
I am having trouble understanding how I would merge both these workflows together (order of steps to follow). Would a combined workflow look something like this?
- ( 1 ) Fastqs to unaligned bams
- ( 2 ) Extract UMI bases as an unaligned BAM tag
- ( 3 ) Unaligned BAM to Fastq
- ( b ) remove sequence adapters
- ( c/4 ) Alignment using BWA MEM
- ( d ) Index bam
- ( 5 ) Merge aligned BAM with unaligned BAM that contains UMI tags
- ( 6 ) Group reads by UMI
- ( 7 ) Call consensus reads
- ( 8 ) Align duplex consensus reads (unaligned consensus reads to Fastq & alignment using BWA MEM)
- ( f ) Recalibrate bam
- ( g/9 ) CollectHSMetrics
- ( h ) Variant calling tools downstream.
Any advice on how to proceed would be appreciated!
Thank you
I've reformatted your post so it's clearer to understand. Italics and bold without formatting were difficult to understand.