1) You can create an unaligned .bam file with the pulse information using bax2bam
:
path/to/blasr/utils/bax2bam/bin/bax2bam -o outputPrefix /path/to/file.1.bax.h5 /path/to/file.2.bax.h5 /path/to/file.3.bax.h5
By default, IPD but not PulseWidth information is added. However, you can customize what features you want to add. For example:
path/to/blasr/utils/bax2bam/bin/bax2bam -o outputPrefix /path/to/file.1.bax.h5 /path/to/file.2.bax.h5 /path/to/file.3.bax.h5 --pulsefeatures=DeletionQV,DeletionTag,InsertionQV,IPD,PulseWidth,MergeQV,SubstitutionQV,SubstitutionTag --losslessframes
You can read more about the pacbio .bam file format here: http://pacbiofileformats.readthedocs.io/en/3.0/BAM.html
2) If you have an aligned cmp.h5 file (or a .sam alignment that you convert to a cmp.h5 file via samtoh5
), you can use loadPulses
to add base modification information:
loadPulses /path/to/file.bas.h5 /path/to/blasr.alignment.cmp.h5
bax2bam
, loadPulses
, and samtoh5
are part of the blasr package:
https://github.com/PacificBiosciences/blasr
You can then use R-kinetics to parse work with the base modification information in the alignment:
https://github.com/PacificBiosciences/R-kinetics
My understanding is that pacbio may not continue to maintain the samtoh5 function (as they switch to using the .bam file format), but you can find it under /path/to/blasr/utils (if compiled). Same is true for loadPulses, unless you compile using pitchfork.