Question

How to run nucmer without reference? (viruses metagenomics)

0

Entering edit mode

3.8 years ago

Arsenal ▴ 160

Hi!

I'm testing detection and abundance of viruses in metagenomics (viromics). In (Roux et al, 2017), The authors state the use of nucmer (from mummer) to cluster the contigs:

' Contigs from all samples were clustered with nucmer (Delcher, Salzberg & Phillippy, 2003) at ≥95% ANI across ≥80% of their lengths, as in (Brum et al., 2015; Gregory et al., 2016), to generate a pool of non-redundant “population contigs” '

Where I'm stuck:

I have all my contigs from all the samples (which are grouped by experimental conditions) in only one denovo assembly file (with megahit). There is no reference; the samples come from mouse gut. Theoretically, there are several (probably unknown) genomes.
nucmer has at least two obligatory multifasta inputs; a reference and the query.

What am I supposed to do?

Merge a selection of viral genomes and use it as reference?
Assemble the samples/groups separately and then use one assembly as reference?
Use the same assembly file as both reference and query?
Split the assembly file then use one (maybe the largest) contig as the reference?
Anything else?

Alternatively, I've performed clustering with CD-HIT. Would nucmer be better at clustering? I can only answer that if I could somehow run nucmer.

If anyone has good experience with another viromics pipeline, I would be happy to test it. Any help will be very much appreciated. Thanks!

assembly clustering metagenomics virus nucmer • 1.3k views

ADD COMMENT • link updated 3.8 years ago by colindaven 7.0k • written 3.8 years ago by Arsenal ▴ 160

0

Entering edit mode

Based on this post I guess the deal is aligning my contigs assembly file to itself.

ADD REPLY • link 3.8 years ago by Arsenal ▴ 160

score 1 · Answer 1 · 2021-02-16

1

Entering edit mode

3.8 years ago

colindaven 7.0k

You might find something like MUGSY http://mugsy.sourceforge.net/ to be more efficient / an easier alternative than nucmer. While nucmer has produces very good alignments, I find it not that easy to automate in all vs all scenarios.

ADD COMMENT • link 3.8 years ago by colindaven 7.0k

0

Entering edit mode

Thank you! But I have just seen that mugsy does use nucmer as well... The mugsy help simply has the same options of nucmer under the "-nucmeropts" parameter.

I still need to know what are the parameters to set the 'Contigs from all samples were clustered with nucmer (Delcher, Salzberg & Phillippy, 2003) at ≥95% ANI across ≥80% of their lengths, as in (Brum et al., 2015; Gregory et al., 2016), to generate a pool of non-redundant “population contigs” '

:(

ADD REPLY • link 3.7 years ago by Arsenal ▴ 160