Question

EOF Marker absent .bam.pbi files

0

Entering edit mode

5.5 years ago

selplat21 ▴ 20

I received a series of subreads.bam.pbi files from my sequencing facility and I unzipped each file. When I try to samtools merge all of the files, I get an error message saying that the EOF marker is absent.

I get the following additional message:

[E::bgzf_read] Read block operation failed with error 4 after 0 of 4 bytes

Files are named like this: m54050R1_180207_203009.subreads.bam.pbi

Filetype is: data

Any direction is helpful.

sequencing Assembly • 2.6k views

ADD COMMENT • link updated 5.5 years ago by Ram 44k • written 5.5 years ago by selplat21 ▴ 20

0

Entering edit mode

I used the gunzip command on all pbi files as the file type was gzip compressed data.

ADD REPLY • link 5.5 years ago by selplat21 ▴ 20

0

Entering edit mode

The PacBio BAM index file (extension bam.pbi) contains a table of semantic information about each read and its alignment (if applicable).

So I don't thinksamtools merge is going to work with those files since they are a PacBio specific extension.

I assume you have corresponding *.bam files? What are you trying to do?

ADD REPLY • link 5.5 years ago by GenoMax 147k

0

Entering edit mode

Yes I do! I am just trying to polish my current genome assembly using arrow. I figured I needed to merge the bam files in order to do this sort of polishing. I was planning to index, align, and sort after merging before running arrow.

ADD REPLY • link 5.5 years ago by selplat21 ▴ 20

0

Entering edit mode

Even when I run the command with subreads.bam, subreads.bam.pbi, or subreads.bam.pbi.gz, I get the same errors. How would you suggest preparing raw data for polishing a completed genome?

ADD REPLY • link 5.5 years ago by selplat21 ▴ 20

score 0 · Answer 1 · 2019-06-10

You didn't explain carefully what your data consists of, so it is difficult to help. From what I gather, you have an (unknown origin) assembly, and want to polish it with the raw PacBio sequencing reads.

Are you trying to merge the .bam.pbi files, or the .bam files? The .bam.pbi files are PacBio BAM index files. In addition to samtools not working on them, what you want to merge are the .bam files, then create a new index for the merged bam with the pbindex program. The BAM recipes wiki has useful information regarding handling of PacBio bam files.

I don't have experience with Arrow and other PacBio tools, but it seems you have to use an aligned (to the assembly you want to polish) bam with Arrow, not the original unaligned bams you have been given by the sequencing center. The docs I've read used BLASR for the alignment step, I don't know if aligning with minimap2 would work with Arrow.

I was planning to index, align, and sort after merging before running arrow.

In case you didn't merge the unaligned bams, you can first map each subread bam separately, sort each of them, then merge the bam, which will result in a sorted merged bam. After merging, you can index the bam, and use this bam with Arrow.