How to Generate PacBio HiFi Reads with Ground Truth Using PBSIM3?
0
0
Entering edit mode
24 days ago
DBPZ • 0

We are working on benchmarking long-read aligners, This leads to the need of generating simulated PacBio HiFi reads. We understand that PBSIM3 can simulate the raw multi-pass subreads, followed by PacBio's ccs software to produce the final HiFi reads. We generated HiFi reads using this approach.

But there is a challenge regarding the "ground truth" for the output HiFi reads. The multiple subreads contributing to a single HiFi read can have slightly different base-by-base alignments (caused by different indels in them) to the chromosome where they are extracted. The differences are then resolved through consensus by ccs, but ccs doesn't give the base-by-base alignments for its output. Thus there isn't a single grand truth that allows unambiguous, base-by-base comparison against the mapping result of that HiFi read.

We then did some literature review. Recent journal articles that involve long-read aligner benchmarking either used the ccs mode of PBSIM1 (LRA) or ran PBSIM2 on its CLR (Continuous Long Read) mode (Winnowmap2 and BLEND) to simulate PacBio reads. We haven't found any article that used PBSIM2 or PBSIM3 to generate subreads, then ran ccs to generate HiFi reads and had the grand truth.

Our aim is to accurately simulate PacBio HiFi reads, which has distinct characteristics and error profiles compared to CLR reads. For this, we still hope to use the most realistic models available for subread generation, in PBSIM3, then create HiFi reads by ccs. Is there any way that we can have grand truth for these HiFi reads?

pacbio pbsim pbsim3 • 372 views
ADD COMMENT

Login before adding your answer.

Traffic: 1788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6