EOF Marker absent .bam.pbi files
1
0
Entering edit mode
5.5 years ago
selplat21 ▴ 20

I received a series of subreads.bam.pbi files from my sequencing facility and I unzipped each file. When I try to samtools merge all of the files, I get an error message saying that the EOF marker is absent.

I get the following additional message:

[E::bgzf_read] Read block operation failed with error 4 after 0 of 4 bytes

Files are named like this: m54050R1_180207_203009.subreads.bam.pbi

Filetype is: data

Any direction is helpful.

sequencing Assembly • 2.6k views
ADD COMMENT
0
Entering edit mode

I used the gunzip command on all pbi files as the file type was gzip compressed data.

ADD REPLY
0
Entering edit mode

The PacBio BAM index file (extension bam.pbi) contains a table of semantic information about each read and its alignment (if applicable).

So I don't thinksamtools merge is going to work with those files since they are a PacBio specific extension.

I assume you have corresponding *.bam files? What are you trying to do?

ADD REPLY
0
Entering edit mode

Yes I do! I am just trying to polish my current genome assembly using arrow. I figured I needed to merge the bam files in order to do this sort of polishing. I was planning to index, align, and sort after merging before running arrow.

ADD REPLY
0
Entering edit mode

Even when I run the command with subreads.bam, subreads.bam.pbi, or subreads.bam.pbi.gz, I get the same errors. How would you suggest preparing raw data for polishing a completed genome?

ADD REPLY
0
Entering edit mode
5.5 years ago
h.mon 35k

You didn't explain carefully what your data consists of, so it is difficult to help. From what I gather, you have an (unknown origin) assembly, and want to polish it with the raw PacBio sequencing reads.

Are you trying to merge the .bam.pbi files, or the .bam files? The .bam.pbi files are PacBio BAM index files. In addition to samtools not working on them, what you want to merge are the .bam files, then create a new index for the merged bam with the pbindex program. The BAM recipes wiki has useful information regarding handling of PacBio bam files.

I don't have experience with Arrow and other PacBio tools, but it seems you have to use an aligned (to the assembly you want to polish) bam with Arrow, not the original unaligned bams you have been given by the sequencing center. The docs I've read used BLASR for the alignment step, I don't know if aligning with minimap2 would work with Arrow.

I was planning to index, align, and sort after merging before running arrow.

In case you didn't merge the unaligned bams, you can first map each subread bam separately, sort each of them, then merge the bam, which will result in a sorted merged bam. After merging, you can index the bam, and use this bam with Arrow.

ADD COMMENT

Login before adding your answer.

Traffic: 1520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6