Entering edit mode
10.9 years ago
Pavel Senin
★
1.9k
Happy new year folks! I've got an exception while trying to build PAUDA database for NCBI's refseq_microbial.faa:
$pauda-build microbial_refseq.faa microbial_refseq-idx
Start ...
Reading file: microbial_refseq-idx/ref.faa
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (929.5s)
Writing mapping file 1: microbial_refseq-idx/ref.map1
Processing sequences:
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (33764.4s)
Writing PNA file: microbial_refseq-idx/ref.pna
Writing mapping file1: microbial_refseq-idx/ref.map2
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (65.4s)
Total sequences in: 30842910
Total sequences out: 18385103
Time: 34759s
Start ...
bowtie2-build microbial_refseq-idx/ref.pna microbial_refseq-idx/ref
Settings:
Output files: "microbial_refseq-idx/ref.*.bt2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
microbial_refseq-idx/ref.pna
Reading reference sizes
Error: Reference sequence has more than 2^32-1 characters! Please divide the
reference into batches or chunks of about 3.6 billion characters or less each
and index each independently.
Time reading reference sizes: 00:01:15
Total time for call to driver() for forward index: 00:01:15
Error: Encountered internal Bowtie 2 exception (#1)
Command: bowtie2-build microbial_refseq-idx/ref.pna microbial_refseq-idx/ref
Deleting "microbial_refseq-idx/ref.3.bt2" file written during aborted indexing attempt.
Deleting "microbial_refseq-idx/ref.4.bt2" file written during aborted indexing attempt.
Should I just split the input FASTA onto few files and run build and my searches on these combining results later?
That or just use a different aligner. Bowtie doesn't support reference sequences that big.
you mean instead of PAUDA or is there a way to plug a different aligner into it?
I mean plugging a different aligner into it, though I imagine that that could be a real pain :( BWA can handle larger genomes, so maybe try to plug that in.