Question

bamToFastq on Sequel subreads.bam

0

Entering edit mode

3.8 years ago

christian.dreischer • 0

Hi,

I got PacBio Sequel data sequenced in CLR mode. I used bedtools bamtofastq on the *subreads.bam files to extract the subreads in FASTQ format. Instead of subreads I got CLR though. Using PacBio's BAM2fastx tools I was able to extract the subreads. I was under the assumption that the subreads.bam files contain subreads and not CLR. Am I wrong or is something fishy either with bamtofastq or the data?

Thanks, Chris

PacBio bam subreads • 2.0k views

ADD COMMENT • link updated 2.1 years ago by Billy Rowell ▴ 330 • written 3.8 years ago by christian.dreischer • 0

0

Entering edit mode

I have the same question. Is bedtools bam2fastq is appropriate for converting pacbio bam file to fastq format?

ADD REPLY • link 2.1 years ago by bagdevi.mishra ▴ 110

score 0 · Answer 1 · 2022-10-21

0

Entering edit mode

2.1 years ago

Billy Rowell ▴ 330

There are a few points to address here:

1) The primary data type output by a CLR mode sequencing run is the subreads.bam file. For all intents and purposes, CLR reads are subreads.

2) PacBio subreads do not have a meaningful base quality score. The base quality scores are set to the ascii character !, the lowest value on the scale. Since these scores are not meaningful, it's not really meaningful to export these to FASTQ. I'd recommend FASTA instead.

3) I'd recommend bam2fasta (as suggested by @christian.dreischer) or samtools fasta to extract the subreads in FASTA format.