Hello All,
I am in process of polishing an assembly produced by Falcon using arrow. However, it is failing with an error "Keyerror BASECALLERVERSION". These are the steps I followed:
- pbalign reference.fasta falcon_draft_assembly.fa --nproc 32 quvir.sam
- samtools view -bS quvir.sam > quvir.bam
- samtools sort quvir.bam > sorted_quvir.bam
- samtools index sorted_quvir.bam
now ran Quiver to Polish assemblies
quiver -j32 sorted_quvir.bam -r corrected_new_workaround.fasta -o variants.gff -o consensus_quiver.fasta --> throws an error as "KeyError: 'BASECALLERVERSION'"
Tried with Arrow as well and it produces the following error
arrow sorted_quvir.bam --referenceFilename corrected_new_workaround.fasta -o arrow-polished-consensus.fasta -o arrow-polished-consensus.gff -o arrow-polished-consensus.fastq -j 32
Please let me know where I am going wrong I have Error is as follows. Please note: I am using GenomicConsensus/3.0.2
Thanks a lot for your kind help, Sincerely, Dave
[W::hts_idx_load2] The index file is older than the data file: /gpfs/projects/sysbio/development/denovo/2_denovo_assembly/falcon/2_arabidopsis/falcon_test_1/falcon_test_1/pbalign_test/quiver_test/sorted_quvir.bam.bai 'BASECALLERVERSION' Traceback (most recent call last): File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcommand/cli/core.py", line 137, in _pacbio_main_runner return_code = exe_main_func(args, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/GenomicConsensus/main.py", line 351, in args_runner return tr.main() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/GenomicConsensus/main.py", line 265, in main with AlignmentSet(options.inputFilename) as peekFile: File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2723, in __init__ super(AlignmentSet, self).__init__(files, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1987, in __init__ super(ReadSet, self).__init__(files, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 477, in __init__ self.updateCounts() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2541, in updateCounts self.assertIndexed() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2371, in assertIndexed self._assertIndexed((IndexedBamReader, CmpH5Reader)) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1944, in _assertIndexed self._openFiles() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2068, in _openFiles resource = IndexedBamReader(location) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 388, in __init__ super(IndexedBamReader, self).__init__(fname, referenceFastaFname) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 202, in __init__ self._loadReadGroupInfo() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 115, in _loadReadGroupInfo basecallerVersion = ".".join(ds["BASECALLERVERSION"].split(".")[0:2]) KeyError: 'BASECALLERVERSION' [ERROR] 'BASECALLERVERSION' Traceback (most recent call last): File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcommand/cli/core.py", line 137, in _pacbio_main_runner return_code = exe_main_func(args, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/GenomicConsensus/main.py", line 351, in args_runner return tr.main() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/GenomicConsensus/main.py", line 265, in main with AlignmentSet(options.inputFilename) as peekFile: File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2723, in __init__ super(AlignmentSet, self).__init__(files, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1987, in __init__ super(ReadSet, self).__init__(files, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 477, in __init__ self.updateCounts() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2541, in updateCounts self.assertIndexed() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2371, in assertIndexed self._assertIndexed((IndexedBamReader, CmpH5Reader)) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1944, in _assertIndexed self._openFiles() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2068, in _openFiles resource = IndexedBamReader(location) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 388, in __init__ super(IndexedBamReader, self).__init__(fname, referenceFastaFname) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 202, in __init__ self._loadReadGroupInfo() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 115, in _loadReadGroupInfo basecallerVersion = ".".join(ds["BASECALLERVERSION"].split(".")[0:2]) KeyError: 'BASECALLERVERSION'
It seems from reading https://github.com/PacificBiosciences/pitchfork/issues/316 that the
KeyError 'BASECALLERVERSION'
problem is to do with missing PacBio headers in the aligned BAM file. I've logged an issue as https://github.com/PacificBiosciences/pbcore/issues/117 using pbalign and arrow