Question

How to make hybrid assembly of a viral genome

0

Entering edit mode

3.0 years ago

Kumar ▴ 170

Hi,

I am looking to make Hybrid assembly of a viral genome. I have got paired-end reads from Illumina and long reads from MinION from a viral sample. I tried Unicycler for making hybrid assembly but it is for bacterial genome assembly. Could anyone please suggest a pipeline for viral genome assembly. Also, please let me know a program for quality check and trimming for MinION data. I am trying LongQC, MinIONQC but I'm not sure if these are appropriate.

Thank you!

Virus Illumina Hybrid Genome assembly MinION • 2.4k views

ADD COMMENT • link updated 3.0 years ago by Buffo ★ 2.4k • written 3.0 years ago by Kumar ▴ 170

score 0 · Answer 1 · 2021-12-15

0

Entering edit mode

3.0 years ago

Buffo ★ 2.4k

Try SPAdes, it is a very popular assembler (hybrid assemblies included) and there are new releases for viruses:

It’s all about the viruses: new coronaSPAdes, rnaviralSPAdes and metaviralSPAdes pipelines.

ADD COMMENT • link 3.0 years ago by Buffo ★ 2.4k

0

Entering edit mode

I have DNA virus. Therefore, which SPAdes I should use for making hybrid assembly. I checked SPAdes manual but I did not find a command to run SPAdes for Illumina and MinION data.

ADD REPLY • link 3.0 years ago by Kumar ▴ 170

0

Entering edit mode

Read the manual:

The current version of SPAdes works with Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. You can also provide additional contigs that will be used as long reads.

Oxford Nanopore = MinION

ADD REPLY • link 3.0 years ago by Buffo ★ 2.4k

0

Entering edit mode

I ran SPAdes with the following command. However, in the output file (scaffold.fasta and contigs.fasta), it is showing several nodes. However, the purpose of making assembly is that I am looking to make an assembly of the whole viral genome, it should come in a single fasta file.

command: spades.py -k 21,33,55,77 --careful -1 file_R1_001.fastq.gz -2 file_R2_001.fastq.gz --nanopore merge.fastq.gz -o out_spades

ADD REPLY • link 3.0 years ago by Kumar ▴ 170

0

Entering edit mode

What do you exactly mean by:

it is showing several nodes

ADD REPLY • link 3.0 years ago by Buffo ★ 2.4k

0

Entering edit mode

Several nodes means multiple scaffolds or contigs fasta (>) in one file but for whole genome assembly, it should come in one complete fasta (>) genome in one file. Usually, when I do Unicycler it generates one complete one fasta sequence in a file. It removes all the gaps and small scaffolds or contigs and generates one complete seq. When we assemble a viral genome it is one complete sequence. See (NC_001802.1) for example. It is a single sequence genome. However, SPAdes assembles the reads in >NODE1, >NODE2 (multiple scaffolds) in a file.

ADD REPLY • link 3.0 years ago by Kumar ▴ 170

0

Entering edit mode

That's the problem:

it should come in one complete fasta

Assembly performance depends on many variables, such as; coverage, sequence quality, genome complexity, etc, etc. So, if your result is fragmented, the problem would be caused for some of those variables. You should start analyzing your input data.

ADD REPLY • link 3.0 years ago by Buffo ★ 2.4k

0

Entering edit mode

Yes, but when I use Unicycler it gives in one fragment. Therefore, I am concerned about the assembler program. I am not using Unicycler because it mentioned that it particular for bacterial assembly.

ADD REPLY • link 3.0 years ago by Kumar ▴ 170

0

Entering edit mode

I strongly suggest you read about Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly. Genome assemblers do not have the same performance among species, sources of data, data quality, and so on.

ADD REPLY • link 3.0 years ago by Buffo ★ 2.4k