I have 125,000 individual reads from PacBio in fasta format, processed from bax.h5 files. I have clustered these reads based on unique molecular identifiers. I would now like to align these individual reads per cluster to a reference genome using the PacBio SMRT portal module blasr.
I am interested in using the bax.h5 information rather than simply the fasta files for the alignment. Is there anyway that I can use the fasta headers to make a whitelist to call the read information from the large bax.h5 files to fish out the associated information?
When I use ConsensusTools to generate a Long Amplicon Analysis for example, there are command line options for using a "file of file names" to then go and get the information from a whitelist. There are no such options for blasr, but I wonder if there is a way to do it before hand? How can I use only a small, defined subset of reads from the large bax.h5 files for blasr? Thanks for any help or suggestions.