You didn't explain carefully what your data consists of, so it is difficult to help. From what I gather, you have an (unknown origin) assembly, and want to polish it with the raw PacBio sequencing reads.
Are you trying to merge the .bam.pbi
files, or the .bam
files? The .bam.pbi
files are PacBio BAM index files. In addition to samtools not working on them, what you want to merge are the .bam
files, then create a new index for the merged bam with the pbindex
program. The BAM recipes wiki has useful information regarding handling of PacBio bam files.
I don't have experience with Arrow and other PacBio tools, but it seems you have to use an aligned (to the assembly you want to polish) bam with Arrow, not the original unaligned bams you have been given by the sequencing center. The docs I've read used BLASR for the alignment step, I don't know if aligning with minimap2 would work with Arrow.
I was planning to index, align, and sort after merging before running arrow.
In case you didn't merge the unaligned bams, you can first map each subread bam separately, sort each of them, then merge the bam, which will result in a sorted merged bam. After merging, you can index the bam, and use this bam with Arrow.
I used the gunzip command on all pbi files as the file type was gzip compressed data.
So I don't think
samtools merge
is going to work with those files since they are a PacBio specific extension.I assume you have corresponding
*.bam
files? What are you trying to do?Yes I do! I am just trying to polish my current genome assembly using arrow. I figured I needed to merge the bam files in order to do this sort of polishing. I was planning to index, align, and sort after merging before running arrow.
Even when I run the command with subreads.bam, subreads.bam.pbi, or subreads.bam.pbi.gz, I get the same errors. How would you suggest preparing raw data for polishing a completed genome?