Arrow Assembly polishing Error "KeyError: 'BASECALLERVERSION'"
2
1
Entering edit mode
6.7 years ago
David_emir ▴ 500

Hello All,

I am in process of polishing an assembly produced by Falcon using arrow. However, it is failing with an error "Keyerror BASECALLERVERSION". These are the steps I followed:

  1. pbalign reference.fasta falcon_draft_assembly.fa --nproc 32 quvir.sam
  2. samtools view -bS quvir.sam > quvir.bam
  3. samtools sort quvir.bam > sorted_quvir.bam
  4. samtools index sorted_quvir.bam
  5. now ran Quiver to Polish assemblies

    quiver -j32 sorted_quvir.bam -r corrected_new_workaround.fasta -o variants.gff -o consensus_quiver.fasta --> throws an error as "KeyError: 'BASECALLERVERSION'"

  6. Tried with Arrow as well and it produces the following error

    arrow sorted_quvir.bam --referenceFilename corrected_new_workaround.fasta -o arrow-polished-consensus.fasta -o arrow-polished-consensus.gff -o arrow-polished-consensus.fastq -j 32

Please let me know where I am going wrong I have Error is as follows. Please note: I am using GenomicConsensus/3.0.2

Thanks a lot for your kind help, Sincerely, Dave

[W::hts_idx_load2] The index file is older than the data file: /gpfs/projects/sysbio/development/denovo/2_denovo_assembly/falcon/2_arabidopsis/falcon_test_1/falcon_test_1/pbalign_test/quiver_test/sorted_quvir.bam.bai 'BASECALLERVERSION' Traceback (most recent call last): File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcommand/cli/core.py", line 137, in _pacbio_main_runner return_code = exe_main_func(args, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/GenomicConsensus/main.py", line 351, in args_runner return tr.main() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/GenomicConsensus/main.py", line 265, in main with AlignmentSet(options.inputFilename) as peekFile: File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2723, in __init__ super(AlignmentSet, self).__init__(files, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1987, in __init__ super(ReadSet, self).__init__(files, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 477, in __init__ self.updateCounts() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2541, in updateCounts self.assertIndexed() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2371, in assertIndexed self._assertIndexed((IndexedBamReader, CmpH5Reader)) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1944, in _assertIndexed self._openFiles() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2068, in _openFiles resource = IndexedBamReader(location) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 388, in __init__ super(IndexedBamReader, self).__init__(fname, referenceFastaFname) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 202, in __init__ self._loadReadGroupInfo() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 115, in _loadReadGroupInfo basecallerVersion = ".".join(ds["BASECALLERVERSION"].split(".")[0:2]) KeyError: 'BASECALLERVERSION' [ERROR] 'BASECALLERVERSION' Traceback (most recent call last): File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcommand/cli/core.py", line 137, in _pacbio_main_runner return_code = exe_main_func(args, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/GenomicConsensus/main.py", line 351, in args_runner return tr.main() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/GenomicConsensus/main.py", line 265, in main with AlignmentSet(options.inputFilename) as peekFile: File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2723, in __init__ super(AlignmentSet, self).__init__(files, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1987, in __init__ super(ReadSet, self).__init__(files, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 477, in __init__ self.updateCounts() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2541, in updateCounts self.assertIndexed() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2371, in assertIndexed self._assertIndexed((IndexedBamReader, CmpH5Reader)) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1944, in _assertIndexed self._openFiles() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2068, in _openFiles resource = IndexedBamReader(location) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 388, in __init__ super(IndexedBamReader, self).__init__(fname, referenceFastaFname) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 202, in __init__ self._loadReadGroupInfo() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 115, in _loadReadGroupInfo basecallerVersion = ".".join(ds["BASECALLERVERSION"].split(".")[0:2]) KeyError: 'BASECALLERVERSION'

arrow Assembly pacbio BASECALLERVERSION • 3.4k views
ADD COMMENT
0
Entering edit mode

It seems from reading https://github.com/PacificBiosciences/pitchfork/issues/316 that the KeyError 'BASECALLERVERSION' problem is to do with missing PacBio headers in the aligned BAM file. I've logged an issue as https://github.com/PacificBiosciences/pbcore/issues/117 using pbalign and arrow

ADD REPLY
0
Entering edit mode
6.7 years ago
liu3yang • 0

you can use subreads bam file instead of fasta in the first step.

ADD COMMENT
0
Entering edit mode
6.4 years ago

Dear David, Have you solved the problem? I met the problem as you describe.Differently,I use blasr but pbalign for mapping.When I run arrow, the ERROR came up.Could you please how to fix it? Regards, Alex

ADD COMMENT
0
Entering edit mode

I just hit something similar, logged as https://github.com/PacificBiosciences/pbcore/issues/117

ADD REPLY
0
Entering edit mode

It seems you can't use a BAM file made by mapping a FASTA file in this way as it is missing PacBio meta-data which is expected (e.g. the BASECALLERVERSION information). You should be able to map the raw unaligned PacBio BAM file, or use the *.subreadset.xml file which also has metadata.

ADD REPLY
0
Entering edit mode

Dear Peter, Thanks for your solutions!

ADD REPLY
0
Entering edit mode

How exactly does one use the XML file? At what point and where?

ADD REPLY

Login before adding your answer.

Traffic: 2305 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6