Unicycler - hybrid assembly failure
0
0
Entering edit mode
5.5 years ago

Dear All,

I've recently encountered some issues with Unicycler assembly. I've tried to perform hybrid assembly with use of

1) trimmed Illumina reads (R1+R2); format: fastqsanger.gz

2) nanopore reads; format: fasqsanger

Unicycler readily deals with individual assembly of either Illumina or Nanopore reads. However, it fails to generate hybrid assembly. Any suggestions?

thanks in advance,

Piotr

PS here is the error report

tput: No value for $TERM and no -T specified 

tput: No value for $TERM and no -T specified 

tput: No value for $TERM and no -T specified

/pylon5/mc48nsp/xcgalaxy/main/staging/23588931/command.sh: line 95:
38467 Segmentation fault      

(core dumped) unicycler -t
"${GALAXY_SLOTS:-4}" -o ./ --verbosity 3 --pilon_path $pilon -1'fq1.fastq.gz' -2 'fq2.fastq.gz' -l lr.fastq --mode 'conservative' --min_fasta_length '100' --linear_seqs '0' --min_kmer_frac '0.2' --max_kmer_frac '0.95' --kmer_count '10' --depth_filter '0.25' --start_gene_id '90.0' --start_gene_cov '95.0' --min_polish_size '1000' --min_component_size '1000' --min_dead_end_size '1000' --scores '3,-6,-5,-2'
genome next-gen sequencing assembly software error • 2.7k views
ADD COMMENT
0
Entering edit mode

How much memory have you got available?

ADD REPLY
0
Entering edit mode

I am currently using 46.5 GB out of total 250.0 GB space.

ADD REPLY
1
Entering edit mode

By memory, I mean RAM, not disk storage.

ADD REPLY
0
Entering edit mode

I've forgot to mention that I am running analyses on Galaxy server.

16GB RAM will be enough to run it offline?

ADD REPLY
1
Entering edit mode

How big are the files, and what size genome are you expecting?

A seg fault suggests you perhaps don’t have enough memory for doing the hybrid assembly, but it works with the 2 datasets on their own as less memory is required. I would be surprised if 16GB is sufficient, but it’s entirely genome/data dependent.

ADD REPLY
0
Entering edit mode

I am expecting genome somewhere around 5 Mb.

In case of input files, nanopore data is quite extensive

1) long reads - 2.3 Gb

2) short reads R1 - 0.17 Gb

3) short reads R2 - 0.16 Gb

ADD REPLY
0
Entering edit mode

I suspect that may be too much data for your local machine. I don’t know what a typical Galaxy RAM allowance is. Presumably it’s dependent on the hosting server.

It might be interesting to try and randomly downsample the reads to see if you can reach a point where it runs, assuming it’s not some other issue.

Alternatively there are assembly + polishing workflows you could try, where you assemble the nanopore data first and then error correct with illumina. This might reduce the burden of having too much data being processed at once.

ADD REPLY

Login before adding your answer.

Traffic: 2539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6