Question

Canu transcriptome assembly behavior

0

Entering edit mode

4.6 years ago

chrys ▴ 80

Hi

I am currently using canu to generate a transcriptome assembly from 6 nanopore RNA-Seq samples. We were quite surprised on the behavior of canu with generating about 3.8 TB of intermediate files which is kind of a lot considering the input is just about 20 GB of read data.

I have used Trinity for transcriptome assembly before but it was recommended to use canu or rnabloom for ultra-long read data so I have no reference on the behavior here.

My question is, is this normal behavior for canu ?

Here is the command that I used:

canu -p EXP-21-1 -d EXP-21-1 genomeSize=3.9g -nanopore EXP1-HAC* -minInputCoverage=2 -stopOnLowCoverage=1

Also, I have not yet found a satisfactory way to perform transcriptome assembly for nanopore reads I would appriciate any input here.

nanopore RNA-Seq canu • 1.7k views

ADD COMMENT • link updated 4.3 years ago by vgilbart ▴ 30 • written 4.6 years ago by chrys ▴ 80

score 3 · Accepted Answer · 2021-04-14

Hi there! Have you seen this study : https://pubmed.ncbi.nlm.nih.gov/31232449/ ? In the supplementary data, they seem to have had a similar problem with Canu, they state "Canu was the hardest tool to run, since it was consuming several terabytes of disk, more than we hadavailable (~7 TB). After several tries, we decided to work around this issue by changing the overlapper Canuuses in the correction step from MHAP to minimap2 (version 2.6:https://github.com/lh3/minimap2/releases/download/v2.6/minimap2-2.6_x64-linux.tar.bz2), following thisthread in Canu’s github: https://github.com/marbl/canu/issues/703 . MHAP’s disk usage is a function on thegenome size and repeat content, so we hypothesize that it was using a very large amount of disk spacebecause it could be confusing highly-expressed transcripts with repeats. This might limit the usage of Canufor correcting long transcriptomic reads"

Hoping it might help you (and others) in using canu!