Bowtie2 genome indexes incomplete
0
0
Entering edit mode
3.4 years ago
Elisa • 0

I'm trying to build the genome indexes with the following comand line (Ubuntu 18.04):

home/bowtie2-2.4.4-linux-x86_64/bowtie2-build ReferenceGenome GrCh38_index

where ReferenceGenome is the GrCh38.fa downloaded from NCBI to which I have added chr1_KI270763v1_alt.fa sequence.

The version of bowtie2 I am using is: version 2.3.4.1.

This process is computationally really heavy as it takes about 3 hours to complete. At the end of the process the first 4 files (.1.bt2/.2.bt2/.3.bt2/.4.bt2) seems to be ok while rev.1.bt2 and rev.2.bt2 I think they are incomplete as they are 352.5 MB and 264.3 MB (the rev.bt2 files downloaded from bowtie2 website instead are 982.5 MB and 733.7 MB).

Actually, when I try to align after building the indexes I get the following error:

Error reading _ebwt[] array: no more data 
Error: Encountered internal Bowtie 2 exception (#1) 
 (ERR): bowtie2-align exited with value 1

and I think this is related to the fact that the indexes are incomplete. Can someone help me to solve this problem?

Thank you

index bowtie2 • 1.7k views
ADD COMMENT
0
Entering edit mode

Please do not post the same question in multiple threads: Issues with genome indexing with bowtie2-build

It appears that you are running out of memory or temporary disk space when you are building the index. Unless you post error messages (that you must have got at build stage) it is difficult to help you.

Yes you are correct that your indexes are incomplete.

ADD REPLY
0
Entering edit mode

No, I did't get any message error unfortunately. How can I avoid these problems during build stage ?

ADD REPLY
0
Entering edit mode

If you use bash shell then you should be able to capture any messages the program generates by doing following:

home/bowtie2-2.4.4-linux-x86_64/bowtie2-build ReferenceGenome GrCh38_index &> message_log

This means that you will need to re-run the build process again so you can capture any error message that was generated.

If you are not using bash shell then you can capture STDOUT and STDERR to two different files by doing:

home/bowtie2-2.4.4-linux-x86_64/bowtie2-build ReferenceGenome GrCh38_index > out_log 2> error_log

Once the run completes (which may also be due to failure) look in the log file for clues. Post the errors here if you can't figure out the problem.

ADD REPLY
0
Entering edit mode

I did not get any error, while the stdout is the following:

    Settings:
      Output files: "GrCh38_index.*.bt2"
      Line rate: 6 (line is 64 bytes)
      Lines per side: 1 (side is 64 bytes)
      Offset rate: 4 (one in 16)
      FTable chars: 10
      Strings: unpacked
      Max bucket size: default
      Max bucket size, sqrt multiplier: default
      Max bucket size, len divisor: 4
      Difference-cover sample period: 1024
      Endianness: little
      Actual local endianness: little
      Sanity checking: disabled
      Assertions: disabled
      Random seed: 0
      Sizeofs: void*:8, int:4, long:8, size_t:8
    Input files DNA, FASTA:
      new.fa
    Reading reference sizes
      Time reading reference sizes: 00:00:39
    Calculating joined length
    Writing header
    Reserving space for joined string
    Joining reference sequences
      Time to join reference sequences: 00:00:21
    bmax according to bmaxDivN setting: 733947027
    Using parameters --bmax 550460271 --dcv 1024
      Doing ahead-of-time memory usage test
      Passed!  Constructing with these parameters: --bmax 550460271 --dcv 1024
    Constructing suffix-array element generator
    Building DifferenceCoverSample
      Building sPrime
      Building sPrimeOrder
      V-Sorting samples
      V-Sorting samples time: 00:02:22
      Allocating rank array
      Ranking v-sort output
      Ranking v-sort output time: 00:00:35
      Invoking Larsson-Sadakane on ranks
      Invoking Larsson-Sadakane on ranks time: 00:01:11
      Sanity-checking and returning
    Building samples
    Reserving space for 12 sample suffixes
    Generating random suffixes
    QSorting 12 sample offsets, eliminating duplicates
    QSorting sample offsets, eliminating duplicates time: 00:00:00
    Multikey QSorting 12 samples
      (Using difference cover)
      Multikey QSorting samples time: 00:00:00
    Calculating bucket sizes
    Splitting and merging
      Splitting and merging time: 00:00:00
    Split 1, merged 7; iterating...
    Splitting and merging
      Splitting and merging time: 00:00:00
    Split 1, merged 1; iterating...
    Splitting and merging
      Splitting and merging time: 00:00:00
    Split 1, merged 1; iterating...
    Splitting and merging
      Splitting and merging time: 00:00:00
    Split 1, merged 0; iterating...
    Splitting and merging
      Splitting and merging time: 00:00:00
    Avg bucket size: 3.66974e+08 (target: 550460270)
    Converting suffix-array elements to index image
    Allocating ftab, absorbFtab
    Entering Ebwt loop
    Getting block 1 of 8
      Reserving size (550460271) for bucket 1
      Calculating Z arrays for bucket 1
      Entering block accumulator loop for bucket 1:
      bucket 1: 10%
      bucket 1: 20%
      bucket 1: 30%
      bucket 1: 40%
      bucket 1: 50%
      bucket 1: 60%
      bucket 1: 70%
      bucket 1: 80%
      bucket 1: 90%
      bucket 1: 100%
      Sorting block of length 502996506 for bucket 1
      (Using difference cover)
      Sorting block time: 00:11:15
    Returning block of 502996507 for bucket 1
    Getting block 2 of 8


 ...
    Getting block 8 of 8
      Reserving size (550460271) for bucket 8
      Calculating Z arrays for bucket 8
      Entering block accumulator loop for bucket 8:
      bucket 8: 10%
      bucket 8: 20%
      bucket 8: 30%
      bucket 8: 40%
      bucket 8: 50%
      bucket 8: 60%
      bucket 8: 70%
      bucket 8: 80%
      bucket 8: 90%
      bucket 8: 100%
      Sorting block of length 111237186 for bucket 8
      (Using difference cover)
      Sorting block time: 00:02:09
    Returning block of 111237187 for bucket 8
    Exited Ebwt loop
    fchr[A]: 0
    fchr[C]: 866706971
    fchr[G]: 1465569915
    fchr[T]: 2066598148
    fchr[$]: 2935788109
    Exiting Ebwt::buildToDisk()
    Returning from initFromVector
    Wrote 982829122 bytes to primary EBWT file: GrCh38_index.1.bt2
    Wrote 733947032 bytes to secondary EBWT file: GrCh38_index.2.bt2
    Re-opening _in1 and _in2 as input streams
    Returning from Ebwt constructor
    Headers:
        len: 2935788109
        bwtLen: 2935788110
        sz: 733947028
        bwtSz: 733947028
        lineRate: 6
        offRate: 4
        offMask: 0xfffffff0
        ftabChars: 10
        eftabLen: 20
        eftabSz: 80
        ftabLen: 1048577
        ftabSz: 4194308
        offsLen: 183486757
        offsSz: 733947028
        lineSz: 64
        sideSz: 64
        sideBwtSz: 48
        sideBwtLen: 192
        numSides: 15290564
        numLines: 15290564
        ebwtTotLen: 978596096
        ebwtTotSz: 978596096
        color: 0
        reverse: 0
    Total time for call to driver() for forward index: 01:36:51
    Reading reference sizes
      Time reading reference sizes: 00:00:29
    Calculating joined length
    Writing header
    Reserving space for joined string
    Joining reference sequences
      Time to join reference sequences: 00:00:14
      Time to reverse reference sequence: 00:00:02
    bmax according to bmaxDivN setting: 733947027
    Using parameters --bmax 550460271 --dcv 1024
      Doing ahead-of-time memory usage test
      Passed!  Constructing with these parameters: --bmax 550460271 --dcv 1024
    Constructing suffix-array element generator
    Building DifferenceCoverSample
      Building sPrime
      Building sPrimeOrder
      V-Sorting samples
      V-Sorting samples time: 00:02:10
      Allocating rank array
      Ranking v-sort output
      Ranking v-sort output time: 00:00:34
      Invoking Larsson-Sadakane on ranks
      Invoking Larsson-Sadakane on ranks time: 00:01:02
      Sanity-checking and returning
    Building samples
    Reserving space for 12 sample suffixes
    Generating random suffixes
    QSorting 12 sample offsets, eliminating duplicates
    QSorting sample offsets, eliminating duplicates time: 00:00:00
    Multikey QSorting 12 samples
      (Using difference cover)
      Multikey QSorting samples time: 00:00:00
    Calculating bucket sizes
    Splitting and merging
      Splitting and merging time: 00:00:00
    Split 1, merged 7; iterating...
    Splitting and merging
      Splitting and merging time: 00:00:00
    Split 1, merged 1; iterating...
    Splitting and merging
      Splitting and merging time: 00:00:00
    Split 1, merged 1; iterating...
    Splitting and merging
      Splitting and merging time: 00:00:00
    Split 1, merged 1; iterating...
    Splitting and merging
      Splitting and merging time: 00:00:00
    Split 1, merged 1; iterating...
    Avg bucket size: 4.19398e+08 (target: 550460270)
    Converting suffix-array elements to index image
    Allocating ftab, absorbFtab
    Entering Ebwt loop
    Getting block 1 of 7
      Reserving size (550460271) for bucket 1
      Calculating Z arrays for bucket 1
      Entering block accumulator loop for bucket 1:
      bucket 1: 10%
      bucket 1: 20%
      bucket 1: 30%
      bucket 1: 40%
      bucket 1: 50%
      bucket 1: 60%
      bucket 1: 70%
      bucket 1: 80%
      bucket 1: 90%
      bucket 1: 100%
      Sorting block of length 527472073 for bucket 1
      (Using difference cover)
      Sorting block time: 00:10:29
    Returning block of 527472074 for bucket 1
    Getting block 2 of 7
      Reserving size (550460271) for bucket 2
      Calculating Z arrays for bucket 2
      Entering block accumulator loop for bucket 2:
      bucket 2: 10%
      bucket 2: 20%
      bucket 2: 30%
      bucket 2: 40%
      bucket 2: 50%
      bucket 2: 60%
      bucket 2: 70%
      bucket 2: 80%
      bucket 2: 90%
      bucket 2: 100%
      Sorting block of length 523317944 for bucket 2
      (Using difference cover)
      Sorting block time: 00:10:31
    Returning block of 523317945 for bucket 2
    Getting block 3 of 7
      Reserving size (550460271) for bucket 3
      Calculating Z arrays for bucket 3
      Entering block accumulator loop for bucket 3:
      bucket 3: 10%
      bucket 3: 20%
      bucket 3: 30%
      bucket 3: 40%
      bucket 3: 50%
      bucket 3: 60%
      bucket 3: 70%
      bucket 3: 80%
      bucket 3: 90%
      bucket 3: 100%
      Sorting block of length 6563536 for bucket 3
      (Using difference cover)
      Sorting block time: 00:00:07
    Returning block of 6563537 for bucket 3
    Getting block 4 of 7
      Reserving size (550460271) for bucket 4
      Calculating Z arrays for bucket 4
      Entering block accumulator loop for bucket 4:
      bucket 4: 10%
      bucket 4: 20%
      bucket 4: 30%
      bucket 4: 40%
      bucket 4: 50%
      bucket 4: 60%
      bucket 4: 70%
      bucket 4: 80%
Process finished with exit code 0
ADD REPLY
0
Entering edit mode

Looks like there was no error. So did you get proper set of indexes this time?

ADD REPLY
0
Entering edit mode

No, rev.1.bt2 and rev.2.bt2 are still incomplete. Do you have any other suggestion to fix this problem ?

ADD REPLY
0
Entering edit mode

Can you index some other reference and verify that your install is working properly?

ADD REPLY

Login before adding your answer.

Traffic: 1780 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6