Question

PacBio - reads with Q<20

0

Entering edit mode

2.6 years ago

pingu77 ▴ 20

Hi all,

I am fairly new to PacBio data analysis and I have a question:

Why do I need to extract the hifi reads? What is the meaning of the other reads that have a Q<20 mean? Should I simply ignore those reads? If I include them in the analysis, are the results reliable?

Thank you for your time!

pacbio hifi quality • 988 views

ADD COMMENT • link 2.6 years ago by pingu77 ▴ 20

score 1 · Answer 1 · 2022-04-02

1

Entering edit mode

2.6 years ago

Billy Rowell ▴ 330

This really all depends on what you plan to do with the data downstream. Most of the current downstream applications expect HiFi (>=Q20) data. For instance, ff you're going to generate a _de novo_ assembly with hifiasm or call small variants with DeepVariant, including the <Q20 reads will cause problems with accuracy, memory usage, and runtime. For detecting structural variation with pbsv, if you use the correct parameters, you might get some added value from the <Q20 reads.

ADD COMMENT • link 2.6 years ago by Billy Rowell ▴ 330

0

Entering edit mode

thanks for your answer! But why are there so many reads with Q<20? I wasn't able to find this information. Also, I would like to look at the repeat regions, I think if I include reads with Q<20 I will have memory usage problem

ADD REPLY • link 2.6 years ago by pingu77 ▴ 20