6.7 years ago
Hi everyone, I'm actually using MaSuRCA-3.2.6 to assemble my genome and a ran the fallowing script:
#PBS -S /bin/bash
#PBS -l nodes=1:ppn=8:bigmem,mem=100gb
#PBS -e /pandata/ACG-0006_0027/LOGS/ACG-006_assembly.error
#PBS -o /pandata/ACG-0006_0027/LOGS/ACG-006_assembly.out
#PBS -N ACG-006
#PBS -q q1week
PE= pe 150 22 /pandata/LEPIWASP/ACG-0006_0027/frag_1.fastq /pandata/LEPIWASP/ACG-0006_0027/frag_2.fastq
JUMP=
GRAPH_KMER_SIZE = auto
USE_LINKING_MATES = 0
LIMIT_JUMP_COVERAGE = 300
#otherwise keep at 0
USE_GRID=0
GRID_QUEUE=all.q
GRID_BATCH_SIZE=500000000
LHE_COVERAGE=25
MEGA_READS_ONE_PASS=0
CA_PARAMETERS = cgwErrorRate=0.15
#set cgwErrorRate=0.25 for bacteria and 0.1<=cgwErrorRate<=0.15 for other organisms.
CA_PARAMETERS = cgwErrorRate=0.15
KMER_COUNT_THRESHOLD = 1
CLOSE_GAPS=1
NUM_THREADS = 16
JF_SIZE = 200000000
JF_SIZE = 200000000
SOAP_ASSEMBLY=0
Then, I got the asemble.sh file and I ran it as well and got the following .out:
[Sat Jun 16 22:32:45 CEST 2018] Processing pe library reads
[Sat Jun 16 22:49:04 CEST 2018] Average PE read length 150
[Sat Jun 16 22:49:05 CEST 2018] Using kmer size of 49 for the graph
[Sat Jun 16 22:49:06 CEST 2018] MIN_Q_CHAR: 33
WARNING: JF_SIZE set too low, increasing JF_SIZE to at least 1115876884, this automatic increase may be not enough!
[Sat Jun 16 22:49:06 CEST 2018] Creating mer database for Quorum
[Sat Jun 16 23:09:23 CEST 2018] Error correct PE.
[Sat Jun 16 23:11:49 CEST 2018] Error correction of PE reads failed. Check pe.cor.log.
and .error:
/panhome/TOOLS/MaSuRCA-3.2.6/assemble.sh: line 102: 46750 Aborted quorum_error_correct_reads -q $((MIN_Q_CHAR + 40)
) --contaminant=/panhome/TOOLS/MaSuRCA-3.2.6/bin/../share/adapter.jf -m 1 -s 1 -g 1 -a 3 -t 16 -w 10 -e 3 -M quorum_mer_db.jf pe.re
named.fastq --no-discard -o pe.cor.tmp --verbose > quorum.err 2>&1
Does someone have an idea of what is going on here? Thanks for your help.
The 2 fasta files are comming from an illumina Hiseq 3000 150bp and the genome size of my specie is around 1.5 GB.
Also posted on SE: https://stackoverflow.com/questions/50891966/issue-using-masurca-3-2-6-assembler
Although it was done automatically, you should nevertheless increase the
value according to the comment by the authors:It was automatically changed to 1115876884 - set it to something higher than that. You're requesting a lot of memory, so, make the most of it. If the genome size is 1.5 gigabase, then multiple that by your target dept of coverage. 1115876884, the value to which the
was changed, is only 1.1 billionThis may have caused the subsequent error because your read error correction failed.
Ok I'm trying with JF_SIZE = 25500000000 thank you :)
I tried with this JF_SIZE and got the same thing:
Can you check the pe.cor.log file that it mentions?
There is no pe.cor.log produced. A Google research on that issue was also not very helpful.
Sure? Is it not in any hidden directory, perhaps?
Actually, I believe you, this issue has been reported elsewhere with no help from anyone:
I was just about to tell you to contact the developers when I found this:
Can you try that?
Yep; I saw it to and it gave me the correct thing:
It is really weird
So, it works now?
No, I mean I did nothing, the frag_.fastaq files were already in
Sorry, I am not sure what could be happening (and the question has gone unanswered on other sites, as you have seen).
I actually used MaSuRCA 3.2.2 relatively recently (2017) and it worked fine, but on a much smaller bacterial genome. Here is my config file:
Is there any way of looking at the PBS logs to see if the memory limit was reached?
I posted the PBS log .error and log.out files juste above (#4 post). I'll try with your settings thank you.
Okay, apologies that I could not assist in this case. I would contact the developers, but chances are that they have metaphorically already 'flown the coup'.
Yes I think so too, I tried your settings and got the same thing. Maybe you have another program to advice me for my kind of data and if possible easy to use? I tried ALLPATHS-LG but I have not the data for.
What about Trinity?
But Trinity is for RNA sequence and I have a DNA one
Apologies for the oversight - I am involved in many threads on this website, some with slightly overlapping themes. My other recommendation for genome assembly would be ABySS: http://www.bcgsc.ca/platform/bioinfo/software/abyss