Hi,
I am looking to make Hybrid assembly of a viral genome. I have got paired-end reads from Illumina and long reads from MinION from a viral sample. I tried Unicycler for making hybrid assembly but it is for bacterial genome assembly. Could anyone please suggest a pipeline for viral genome assembly. Also, please let me know a program for quality check and trimming for MinION data. I am trying LongQC, MinIONQC but I'm not sure if these are appropriate.
Thank you!
I have DNA virus. Therefore, which SPAdes I should use for making hybrid assembly. I checked SPAdes manual but I did not find a command to run SPAdes for Illumina and MinION data.
Read the manual:
Oxford Nanopore = MinION
I ran SPAdes with the following command. However, in the output file (scaffold.fasta and contigs.fasta), it is showing several nodes. However, the purpose of making assembly is that I am looking to make an assembly of the whole viral genome, it should come in a single fasta file.
command: spades.py -k 21,33,55,77 --careful -1 file_R1_001.fastq.gz -2 file_R2_001.fastq.gz --nanopore merge.fastq.gz -o out_spades
What do you exactly mean by:
Several nodes means multiple scaffolds or contigs fasta (>) in one file but for whole genome assembly, it should come in one complete fasta (>) genome in one file. Usually, when I do Unicycler it generates one complete one fasta sequence in a file. It removes all the gaps and small scaffolds or contigs and generates one complete seq. When we assemble a viral genome it is one complete sequence. See (NC_001802.1) for example. It is a single sequence genome. However, SPAdes assembles the reads in >NODE1, >NODE2 (multiple scaffolds) in a file.
That's the problem:
Assembly performance depends on many variables, such as; coverage, sequence quality, genome complexity, etc, etc. So, if your result is fragmented, the problem would be caused for some of those variables. You should start analyzing your input data.
Yes, but when I use Unicycler it gives in one fragment. Therefore, I am concerned about the assembler program. I am not using Unicycler because it mentioned that it particular for bacterial assembly.
I strongly suggest you read about Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly. Genome assemblers do not have the same performance among species, sources of data, data quality, and so on.