Entering edit mode
5.5 years ago
piotr.majewski
•
0
Dear All,
I've recently encountered some issues with Unicycler assembly. I've tried to perform hybrid assembly with use of
1) trimmed Illumina reads (R1+R2); format: fastqsanger.gz
2) nanopore reads; format: fasqsanger
Unicycler readily deals with individual assembly of either Illumina or Nanopore reads. However, it fails to generate hybrid assembly. Any suggestions?
thanks in advance,
Piotr
PS here is the error report
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
/pylon5/mc48nsp/xcgalaxy/main/staging/23588931/command.sh: line 95:
38467 Segmentation fault
(core dumped) unicycler -t
"${GALAXY_SLOTS:-4}" -o ./ --verbosity 3 --pilon_path $pilon -1'fq1.fastq.gz' -2 'fq2.fastq.gz' -l lr.fastq --mode 'conservative' --min_fasta_length '100' --linear_seqs '0' --min_kmer_frac '0.2' --max_kmer_frac '0.95' --kmer_count '10' --depth_filter '0.25' --start_gene_id '90.0' --start_gene_cov '95.0' --min_polish_size '1000' --min_component_size '1000' --min_dead_end_size '1000' --scores '3,-6,-5,-2'
How much memory have you got available?
I am currently using 46.5 GB out of total 250.0 GB space.
By memory, I mean RAM, not disk storage.
I've forgot to mention that I am running analyses on Galaxy server.
16GB RAM will be enough to run it offline?
How big are the files, and what size genome are you expecting?
A seg fault suggests you perhaps don’t have enough memory for doing the hybrid assembly, but it works with the 2 datasets on their own as less memory is required. I would be surprised if 16GB is sufficient, but it’s entirely genome/data dependent.
I am expecting genome somewhere around 5 Mb.
In case of input files, nanopore data is quite extensive
1) long reads - 2.3 Gb
2) short reads R1 - 0.17 Gb
3) short reads R2 - 0.16 Gb
I suspect that may be too much data for your local machine. I don’t know what a typical Galaxy RAM allowance is. Presumably it’s dependent on the hosting server.
It might be interesting to try and randomly downsample the reads to see if you can reach a point where it runs, assuming it’s not some other issue.
Alternatively there are assembly + polishing workflows you could try, where you assemble the nanopore data first and then error correct with illumina. This might reduce the burden of having too much data being processed at once.