I got PacBio Sequel data sequenced in CLR mode. I used bedtools bamtofastq on the *subreads.bam files to extract the subreads in FASTQ format. Instead of subreads I got CLR though. Using PacBio's BAM2fastx tools I was able to extract the subreads. I was under the assumption that the subreads.bam files contain subreads and not CLR. Am I wrong or is something fishy either with bamtofastq or the data?
1) The primary data type output by a CLR mode sequencing run is the subreads.bam file. For all intents and purposes, CLR reads are subreads.
2) PacBio subreads do not have a meaningful base quality score. The base quality scores are set to the ascii character !, the lowest value on the scale. Since these scores are not meaningful, it's not really meaningful to export these to FASTQ. I'd recommend FASTA instead.
3) I'd recommend bam2fasta (as suggested by @christian.dreischer) or samtools fasta to extract the subreads in FASTA format.
I have the same question. Is bedtools bam2fastq is appropriate for converting pacbio bam file to fastq format?