Question

How to Generate PacBio HiFi Reads with Ground Truth Using PBSIM3?

0

Entering edit mode

24 days ago

DBPZ • 0

We are working on benchmarking long-read aligners, This leads to the need of generating simulated PacBio HiFi reads. We understand that PBSIM3 can simulate the raw multi-pass subreads, followed by PacBio's ccs software to produce the final HiFi reads. We generated HiFi reads using this approach.

But there is a challenge regarding the "ground truth" for the output HiFi reads. The multiple subreads contributing to a single HiFi read can have slightly different base-by-base alignments (caused by different indels in them) to the chromosome where they are extracted. The differences are then resolved through consensus by ccs, but ccs doesn't give the base-by-base alignments for its output. Thus there isn't a single grand truth that allows unambiguous, base-by-base comparison against the mapping result of that HiFi read.

We then did some literature review. Recent journal articles that involve long-read aligner benchmarking either used the ccs mode of PBSIM1 (LRA) or ran PBSIM2 on its CLR (Continuous Long Read) mode (Winnowmap2 and BLEND) to simulate PacBio reads. We haven't found any article that used PBSIM2 or PBSIM3 to generate subreads, then ran ccs to generate HiFi reads and had the grand truth.

Our aim is to accurately simulate PacBio HiFi reads, which has distinct characteristics and error profiles compared to CLR reads. For this, we still hope to use the most realistic models available for subread generation, in PBSIM3, then create HiFi reads by ccs. Is there any way that we can have grand truth for these HiFi reads?

pacbio pbsim pbsim3 • 372 views

ADD COMMENT • link 24 days ago by DBPZ • 0