Hi all,
I have 3 paired-end libraries from Illumina sequencing (151bp). Each library has almost 15M of reads. However, when I run spades I'm running out of RAM when the assembly starts with kmers 55; the error looks like this: jemalloc: Error in malloc(): out of memory Requested: 8388608. So, the genome assembly could not be completed. It's a small genome (25Mb).
I'm working on a server with 1.5Tb of RAM. And this is my code:
spades.py --careful -k 55,77,99 -t 32 -m 1000 --pe1-1 work/mapeo2_F.fastq
--pe1-2 work/mapeo2_R.fastq --pe1-1 work/mapeo1_F.fastq --pe1-2 work/mapeo1_R.fastq --pe1-
1 work/mapeo3_F.fastq --pe1-2 work/mapeo3_R.fastq -o work/Hcol
Do you have any idea of how to overcome that error?
You may have too much data for a small genome. Consider normalizing your sequence reads with
bbnorm.sh
before trying this assembly.Are you the only user on this machine? If not, your program may have much less RAM available to work with.
Not related to RAM consumption, but you run Spades in a wrong way. If you have three libraries, you should provide them with --pe1-1, --pe1-2, --pe2-1, --pe2-2, --pe3-1, --pe3-2. The number after "pe" is the number of the library.
You can also check how much memory is actaully free with
free -h
. You actaul free memory will be somwhere between the "free" column and the "available" column. In theory all the memory in the "available" column should be accessible to you, but in practice we've found that this sometimes isn't the case (e.g. we once had a case where a memory mapped file was being kept in memory after the termination of the program that used it, and wasn't being released when the OS asked for it).