Hello,
I am trying to call variants with a panel that makes use of UMI adapters. As with Illumina's TSO500 pipeline, UMI tagged fragments of DNA are amplified and I can use a tool such as fgbio's CallDuplexConsensusReads to collapse all reads with a given UMI to a consensus read (to eliminate PCR errors). For example, 5 forward reads and 5 reverse reads can be collapsed into a forward and reverse consensus read, respectively, which can then be collapsed into a duplex consensus read. Ideally, we would at least have 1 forward read and 1 reverse read contribute to the duplex consensus read, which can be specified by setting the min-reads parameter of CallDuplexConsensusReads to 2 1 1 (2 reads total, 1 fwd, 1 rev).
We had some lower quality data, and the supplier of the panel told us to use CallDuplexConsensusReads with min-reads set to 1 1 0, meaning that only one read (either forward or reverse) is required to be declared a consensus read. The result is that most reads are retained and there is a mix of what are basically raw reads and UMI-collapsed duplex consensus reads, although these consensus reads may just be constructed from only a forward or reverse consensus read.
Using CallDuplexConsensusReads with min-reads set to 2 1 1 results in a fraction of the original reads being used: e.g. out of 130 million reads, 16 million are collapsed into ~500k consensus reads with at least 1 fwd, 1 rev read. The mean UMI-collapsed coverage of targets (by sample) is in a range of 50-127, however these would be "true" duplex consensus reads. Also, some targets have relatively good UMI-collapsed coverage (25-800).
In summary, I am wondering if it would be best to call variants using matched tumour-normal samples with raw reads, the "true" collapsed reads (from min-reads set to 2 1 1), or the mix of raw and collapsed reads (from min-reads set to 1 1 0)? The 1 1 0 flag was suggested as an interim solution to the low quality data, however, I feel that calling variants on a mix of raw and collapsed reads is not ideal.