Question

Pauda (Bowtie) Exception While Processing Microbial_Refseq

0

Entering edit mode

11.3 years ago

Pavel Senin ★ 1.9k

Happy new year folks! I've got an exception while trying to build PAUDA database for NCBI's refseq_microbial.faa:

$pauda-build microbial_refseq.faa microbial_refseq-idx
Start ...
Reading file: microbial_refseq-idx/ref.faa
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (929.5s)
Writing mapping file 1: microbial_refseq-idx/ref.map1
Processing sequences:
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (33764.4s)
Writing PNA file: microbial_refseq-idx/ref.pna
Writing mapping file1: microbial_refseq-idx/ref.map2
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (65.4s)
Total sequences in:  30842910
Total sequences out: 18385103
Time: 34759s
Start ...
bowtie2-build microbial_refseq-idx/ref.pna microbial_refseq-idx/ref
Settings:
  Output files: "microbial_refseq-idx/ref.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  microbial_refseq-idx/ref.pna
Reading reference sizes
Error: Reference sequence has more than 2^32-1 characters!  Please divide the
reference into batches or chunks of about 3.6 billion characters or less each
and index each independently.
  Time reading reference sizes: 00:01:15
Total time for call to driver() for forward index: 00:01:15
Error: Encountered internal Bowtie 2 exception (#1)
Command: bowtie2-build microbial_refseq-idx/ref.pna microbial_refseq-idx/ref 
Deleting "microbial_refseq-idx/ref.3.bt2" file written during aborted indexing attempt.
Deleting "microbial_refseq-idx/ref.4.bt2" file written during aborted indexing attempt.

Should I just split the input FASTA onto few files and run build and my searches on these combining results later?

bowtie • 4.2k views

ADD COMMENT • link updated 10.6 years ago by Biostar 20 • written 11.3 years ago by Pavel Senin ★ 1.9k

0

Entering edit mode

That or just use a different aligner. Bowtie doesn't support reference sequences that big.

ADD REPLY • link 11.3 years ago by Devon Ryan 105k

0

Entering edit mode

you mean instead of PAUDA or is there a way to plug a different aligner into it?

ADD REPLY • link 11.3 years ago by Pavel Senin ★ 1.9k

0

Entering edit mode

I mean plugging a different aligner into it, though I imagine that that could be a real pain :( BWA can handle larger genomes, so maybe try to plug that in.

ADD REPLY • link 11.3 years ago by Devon Ryan 105k