Hello everyone,
I am stuck since quite some time on a side dish with Hi-C which require to first index the genome with bowtie2. I used it on some others references genome with no issues, but I just can't make it work for that one particular genome.
I use bowtie2-2.3.5.1, and my command is : bowtie2-build mother_raw_wtdbg2_58x_polished.fa bowtie2_index/mother_raw_wtdbg2_58x_polished.
This is the output I am getting :
Settings:
Output files: "bowtie2_index/mother_raw_wtdbg2_58x_polished.*.bt2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
mother_raw_wtdbg2_58x_polished.fa
Building a SMALL index
Reading reference sizes
Time reading reference sizes: 00:00:33
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:25
bmax according to bmaxDivN setting: 660338289
Using parameters --bmax 495253717 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 495253717 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:01:27
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:18
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:44
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 2.64135e+09 (target: 495253716)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
No samples; assembling all-inclusive block
And this is the directory where it writes its outputs :
sbsuser@node125: /work/sbsuser/test/roxane/bowtie2 $ll
total 630M
-rw-r--r-- 1 sbsuser GET-PLAGE 72K Dec 10 10:48 bovin_genome_index.1.bt2
-rw-r--r-- 1 sbsuser GET-PLAGE 0 Dec 10 10:48 bovin_genome_index.2.bt2
-rw-r--r-- 1 sbsuser GET-PLAGE 43K Dec 10 10:48 bovin_genome_index.3.bt2
-rw-r--r-- 1 sbsuser GET-PLAGE 630M Dec 10 10:48 bovin_genome_index.4.bt2
So it feels ike he starts doing something, then for some reason he consider it's "empty" and stop indexing.
I have strictly no idea of what I am doing wrong and I have been pulling my hair too long on this... Can someone please help me pointing out the dumb mistake I am probably making ?
Have a nice day,
Roxane
I think it is not an issue on your side. There are a couple of Github issues on this kind of error, e.g. https://github.com/BenLangmead/bowtie2/issues/194 that I would probably add a comment to it and see what the developers have to say.