Entering edit mode
8.6 years ago
shammond
▴
10
Hi guys, I'm having some problems assembling a 2 x 250 bp, 76x coverage data set using Minia 2.0.3:
[DSK: Collecting stats on read sample ] 100 % elapsed: 5 min 59 sec estimated remaining: 0 min 0 sec cpu: 297.9 % mem: [ 844, 844, 844] MB
[DSK: Pass 1/1, Step 2: counting kmers ] 50.3 % elapsed: 115 min 10 sec estimated remaining: 113 min 36 sec cpu: 404.8 % mem: [6337, 6430, 6430] MB Warning: forced to allocate extra memory: 14650 MB
EXCEPTION: Pool allocation failed for 1682 bytes (bank ids alloc). Current usage is 15362380148 and capacity is 15362381814
Or, sometimes it fails with this exception:
EXCEPTION: Pool allocation failed for 2808456 bytes (kmers alloc)
I ran Kmergenie (1.7016), and I was surprised that it recommended a coverage cut-off of 1 for the best k of 64. As your manual recommends, I instead used a cut-off of 2, and also tried higher thresholds (3, 4, 10, even 100 and above). Unfortunately I kept getting this error. The machines I've been using have > 1.5 TB RAM, so I wouldn't expect to be running out.
I'm running minia like so:
minia -in read-files.txt -abundance-min 4 -kmer-size 64 -nb-cores 32 -max-memory 0
Thanks in advance!
Hi, I'm not sure about this "-max-memory 0". Could you perhaps try with a higher memory setting, e.g. "-max-memory 20000" ?
Hi, I've tried values up to 2200000 for -max-memory, and at the top end I will get an error like this:
and a pool allocation exception.
Is it possible that it's not utilizing all of the memory specified in -max-memory? What is the significance of the three values noted after "mem:" in the log?
The three values are: 1. current memory usage measured by system, 2. maximum of the values ever seen in field 1 3. maximum memory usage as measured by the system (ru_maxrss)
It's possible that not all memory is used.
Until we release a new official minia version, could you try with this unreleased beta version of minia 3.0? It's a linux 64 bits binary. https://github.com/GATB/gatb-pipeline/raw/master/minia/minia
Thanks for that beta, Rayan. It was able to assemble my data at k64, and reported max memory usage of 872 GB. Are there any parameters I could tweak in 2.0.3 to get around this memory issue?
Hi, it is quite unusual to see such a large memory usage. I wonder what is special about your data.. Can you please tell me the number of files and total size of the read dataset files, and the number of distinct kmers reported by minia 3 (possibly the full log of the output stats at the end).
Hi Rayan, I have one set of PET reads, so two gzipped read files, 59 GB and 63 GB. Minia 3 reports 3045237911 solid kmers.
Full output stats from the run:
Thanks for your ongoing help.
When I set max-memory to 0 and min-abundance to "auto", minia reports peak memory of 24.9 GB.
Hi, thanks for those details. 24.9 GB seems more in line with what Minia 3 typically uses.
What I think was going on, is:
Minia version 2 had the "Pool allocation failed" that will be fixed in version 3. I told you to try with a higher memory limit but that didn't seem to be a valid workaround.
Minia version 3 seems to have completed the assembly just fine (in 11 hours) with default parameters. (Were you happy with the resulting contigs quality by the way?)
When you set a high memory limit in Minia 3 (or even 2), the k-mer counting step uses all this memory just because it thinks it can. But it's not necessary to specify -max-memory in Minia 3, except perhaps for very large genomes (> 5 Gbp).
Hi, the contigs were a bit shorter than I was hoping for. My N50 was about 3kb. I tried several k up to 128, where my N50 reached 4.2kb. k larger than 128 failed with this error:
I ran into this error with Minia 2 as well, even when I compiled it to use higher k according to the instructions in the manual (and this post). Is there a detail I'm missing?
About the contig quality, would you recommend a particular way to assess this?
Also, is it alright for me to use these results in a conference?
Thanks again, Austin
You have a point here: for kmer size >= 128, the default algorithm of one part of the de Bruijn algorithm (cascading BLoom filters) can't work.
In such a case, a specific option has to be used, ie. you should add -debloom original in your minia command line (the consequence is a bigger memory peak). Could you confirm if it works on your example ?
As a matter of fact, we have to correct this and go back to this alternative algorithm as soon as kmer size is >= 128.
Hi Erwan, Austin,
I've implemented "-debloom original" in Minia for k>128. The change is now effective if you compile the source from Github, and it will be included in the next release of Minia 3.
Austin, sure, you can use those results in a conference. Thanks for checking with us.
Regarding assessment of contigs quality, I recommend the QUAST software and taking NG50's instead of N50's. In the absence of a reference, it isn't easy to evaluate an assembly. One approach is FRCbam.
Hi guys, Using "-debloom original" with the binary you provided did the trick. And I'll check out those analysis tools. Thanks for your help! Austin