Question

minimap2, before or after assembly?

0

Entering edit mode

2.5 years ago

emiliomastriani ▴ 40

Dear all, I feel confused because I saw someone uses the minimap2 after demultiplexing, but before proceeding with the assembly (CANU) [case 1], and someone using the minimap2/samfile/BFCTools/medaka after assembly (always CANU) [case 2]. In case 1, the reference file used to align against the fastq files was a public available sequence from NCBI (NC_003310), while in the case 2 the file used as reference were the contigs from CANU to obtain the polished consensus. To be honest, I don't understand when is it convenient to use the 1st or the 2nd approach to get the best result. Does it only depend on having or not the public reference genome? Please, can someone give me some help? Thank you very much. Emilio

canu nanopore minimap2 • 2.2k views

ADD COMMENT • link updated 2.5 years ago by colindaven 7.6k • written 2.5 years ago by emiliomastriani ▴ 40

0

Entering edit mode

It entirely depends what you want to do.

ADD REPLY • link 2.5 years ago by samuel.a.odonnell ▴ 590

0

Entering edit mode

minimap2 is a sequence alignment tool and is blind to whether or not you are aligning from or to the reference genome. In your examples - Case 1 is two genome sequences alignment, Case 2 is aligning contig sequences to a reference genome

ADD REPLY • link 2.5 years ago by manaswwm ▴ 570

0

Entering edit mode

Why don't you start by listing

your dataset
your research question
your goal (de novo assembly, find SNPs, something else)

Then people can help you more to get to your goal.

ADD REPLY • link 2.5 years ago by colindaven 7.6k

0

Entering edit mode

You are right. I am sorry to have been not so clear. Dataset: we have a collection of fast5 files from Nanopore MinION, in total 10 barcodes Research question: metagenomic analysis, in detail: identify viral sequences as much precise as possible My goal:to improve the assembled contigs obtained with MegaHit/CANU to get the whole genome

Thank you very much

ADD REPLY • link 2.5 years ago by emiliomastriani ▴ 40

score 0 · Answer 1 · 2022-12-06

Option 1

With viruses, you can get a long way with a single corrected read. You generated these corrected reads as part of running Canu.
You can align these reads with eg minimap2 against a custom FASTA database of the viruses you intended to get. Does it agree with the contig level de novo ? Do you get more information ?

Option 2

Try another de novo assembly algoirthm. metaFlye or Shasta might help.
Also metaflye? https://github.com/fenderglass/Flye/issues/101

Option 3

Further improve current metagenomic assembly ( I think option 2 is better).
polish with Racon, then Medaka
Possibly then scaffold further if possible with Ragtag https://github.com/malonge/RagTag