Question

Converting bax.h5 to CCS reads

0

Entering edit mode

5.7 years ago

firestar ★ 1.7k

I have some pacbio reads. That's really the only info I have. No version number or anything. Each sample has four bax.h5 and one bas.h5. Is it possible to convert this to consensus reads?

I tried running the css tool from pbcss. It runs but I get the report below:

ZMW Yield
Success (without retry) -- CCS generated,1862,2.38%
Success (with retry)    -- CCS generated,134,0.17%
Failed -- Below SNR threshold,35869,45.89%
Failed -- No usable subreads,2072,2.65%
Failed -- Insert size too long,2,0.00%
Failed -- Insert size too small,0,0.00%
Failed -- Not enough full passes,38157,48.82%
Failed -- Too many unusable subreads,0,0.00%
Failed -- CCS did not converge,0,0.00%
Failed -- CCS below minimum predicted accuracy,70,0.09%
Failed -- Unknown error during processing,0,0.00%

So there are very few CCS reads in the end.

I also tried another function bax2bam which has the --ccs argument. It seems like this is supposed to be an additional file named ccs.h5 which I don't have.

Running this bax2bam -o sample *.bax.h5 --ccs gives this error: ERROR, Could not initialize ccs file But, this functions works fine without the --ccs option to produce perfectly fine .bam files.

Is it possible that there are no CCS reads here? Or am I doing something wrong?

UPDATE: FastQC sequence length distribution based on filtered subreads for this sample.

enter image description here

pacbio • 2.4k views

ADD COMMENT • link 5.7 years ago by firestar ★ 1.7k

score 1 · Answer 1 · 2019-07-29

You can prepare at least two different types of PacBio libraries:

1) Long fragments for subreads for assembly (probably usually longer than can be sequenced by the movie length)

2) Relatively shorter reads for CCS creation (to be sequenced several times in one subread)

I think PacBio uses 3 cycles by default. You might want to increase that to 5 cycles for some applications. However, the message "Not enough full passes,38157,48.82%" makes me think you either had an issue with having subreads that were too long (perhaps intended for another purpose) or your movie time was not long enough to get at least 3 passes per subread.

I'm not sure how big of a deal the "Below SNR threshold,35869,45.89%" message is, but can use FastQC to generate plots for the read length distribution if you create subread FASTQ files (not CCS FASTQ files). Or, you might have already been provided this (or can use some other PacBio program to generate this.

I don't think PacBio (currently) officially supports anything except pbconda installation. In the past, I would use separate bax2bam and ccs commands (with the 2nd command coming from pbccs).