Hi, im processing a large dataset, aprox. 234G fasta.gz of paired end illumina reads, and after a week it terminates with an bad_alloc error. Seems a memory problem, mi server has 16G and the Blooms have less than 6 G, so what's the problem?
I'm using 3 Gb as genome size, perhaps I should increase that to cover that memory? The estimate in the begining is approx. 2 G of memory, the error is because I'm over that limit?
Any suggestions?
Thank u in advance
-------------------Debloom time Wallclock 142447 s
binary pass
Insert solid Kmers in Bloom 5235620000
Inserted 5235629138 solid kmers in the bloom structure.
Insert false positive T4 256778974Size of the Bloom table (B1) : 3766.27 MB
Size of the Bloom table (B2) : 1225.75 MB
Size of the Bloom table (B3) : 210.44 MB
Size of the Bloom table (B4) : 68.49 MB
Size of the FP table (T4) : 29.43 MB
Total 5300.37 MB for 5235629138 solid kmers ==> 8.49 bits / solid kmer
______________________________________________________
___________ Assemble from bloom filter _______________
______________________________________________________
Extrapolating the number of branching kmers from the first 3M kmers: 388193061
Indexing branching kmers 536870500 / ~388191870 terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
estimated values: nbits Bloom 18103109632, nb FP 43535668, max memory 2158 MB
Thanks for your answers!
I used 31, min abundance of 3. It's true, I forgot Kmergenie... I'll try with your suggestions
Thank u again. Great software!