How to make hybrid assembly of a viral genome
1
0
Entering edit mode
3.0 years ago
Kumar ▴ 170

Hi,

I am looking to make Hybrid assembly of a viral genome. I have got paired-end reads from Illumina and long reads from MinION from a viral sample. I tried Unicycler for making hybrid assembly but it is for bacterial genome assembly. Could anyone please suggest a pipeline for viral genome assembly. Also, please let me know a program for quality check and trimming for MinION data. I am trying LongQC, MinIONQC but I'm not sure if these are appropriate.

Thank you!

Virus Illumina Hybrid Genome assembly MinION • 2.4k views
ADD COMMENT
0
Entering edit mode
3.0 years ago
Buffo ★ 2.4k

Try SPAdes, it is a very popular assembler (hybrid assemblies included) and there are new releases for viruses:

It’s all about the viruses: new coronaSPAdes, rnaviralSPAdes and metaviralSPAdes pipelines.

ADD COMMENT
0
Entering edit mode

I have DNA virus. Therefore, which SPAdes I should use for making hybrid assembly. I checked SPAdes manual but I did not find a command to run SPAdes for Illumina and MinION data.

ADD REPLY
0
Entering edit mode

Read the manual:

The current version of SPAdes works with Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. You can also provide additional contigs that will be used as long reads.

Oxford Nanopore = MinION

ADD REPLY
0
Entering edit mode

I ran SPAdes with the following command. However, in the output file (scaffold.fasta and contigs.fasta), it is showing several nodes. However, the purpose of making assembly is that I am looking to make an assembly of the whole viral genome, it should come in a single fasta file.

command: spades.py -k 21,33,55,77 --careful -1 file_R1_001.fastq.gz -2 file_R2_001.fastq.gz --nanopore merge.fastq.gz -o out_spades

ADD REPLY
0
Entering edit mode

What do you exactly mean by:

it is showing several nodes

ADD REPLY
0
Entering edit mode

Several nodes means multiple scaffolds or contigs fasta (>) in one file but for whole genome assembly, it should come in one complete fasta (>) genome in one file. Usually, when I do Unicycler it generates one complete one fasta sequence in a file. It removes all the gaps and small scaffolds or contigs and generates one complete seq. When we assemble a viral genome it is one complete sequence. See (NC_001802.1) for example. It is a single sequence genome. However, SPAdes assembles the reads in >NODE1, >NODE2 (multiple scaffolds) in a file.

ADD REPLY
0
Entering edit mode

That's the problem:

it should come in one complete fasta

Assembly performance depends on many variables, such as; coverage, sequence quality, genome complexity, etc, etc. So, if your result is fragmented, the problem would be caused for some of those variables. You should start analyzing your input data.

ADD REPLY
0
Entering edit mode

Yes, but when I use Unicycler it gives in one fragment. Therefore, I am concerned about the assembler program. I am not using Unicycler because it mentioned that it particular for bacterial assembly.

ADD REPLY
0
Entering edit mode

I strongly suggest you read about Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly. Genome assemblers do not have the same performance among species, sources of data, data quality, and so on.

ADD REPLY

Login before adding your answer.

Traffic: 1945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6