Hi
I am working on viral genome and the reads i generated is from ion torrent method. After performing denovo assembly using trinity tool i got contigs. Next step is to construct a draft genome from the contigs. Is there any tools/ softwares to create a draft genome from the contigs.
Can anyone please help me..
There are several assembly pipelines geared towards viral assembly. As viral samples are a mix of host and viral sequences, one key step is "contaminant" removal. Depending on the pipeline (and also if host and viral reference genomes are available, and if viral material has been enriched prior to library preparation), host sequences removal can be performed before or after assembly. Check some of the available pipelines:
A set of contigs for an organism is considered the draft genome already (at least that is what I think). Do you mean to build a scaffold for these contigs - join contigs to improve your draft genome N50?
Hi,
I have some experience with eukarypte genome assembly from Illumina reads, but I've never worked on viruses or Ion torrent data. Still, a few things to note.
I'm not sure Trinity is the most suitable tool, since it is intended for transcriptome, not genome assembly (and does a great job at it). Why did you choose it? You might be better off with a genome assembler like SPAdes.
The next step after creating contigs is usually scaffolding, in which you create longer, gapped fragments out of your contigs, called scaffolds. To do that you can use paired-end data (preferably long distance, e.g. mate pair) or long reads. SPAdes will perform scaffolding as part of the assembly, if you provide the relevant data.
Achieving a full draft genome (i.e. a single sequence per chromosome) is usually challenging, although maybe with small and simple viral genomes this might be possible. You'll have to use either an existing reference genome of a closely related virus or additional information from genetic, physical or optical maps. I also hear that people interested in viral genomics use ONT sequencing, as it can sometimes sequence the whole genome in one read, and thus no assembly is needed. However, this technology has its own problems and will require you to sequence all over again.
What is the estimated genome size of your virus? to what depth did you sequence it, and with what library?
OK, so if I understand correctly, it looks like you have decent coverage, but no paired-end info of any type. This means you can't really expect to do any scaffolding with just the data you currently have. How about using an existing assembly of another strain like this one? You could use a tool like RaGOO to do that. This approach assumes that your genome is pretty close to the reference. Also note that if you are looking for large SVs, this approach might not be optimal. What is your end goal? Why do you want to perform scaffolding? what's the planned downstream analysis?
I don't see a reason why not, but I really have no idea.
The question is - do you really need a chromosome-level assembly? Contigs may be good enough for your analysis. What is your contig N50?
Hi liorgiic, could I know exactly what run you used to obtain that N50? I would like to make some tests (just for learning). I have tried SRR3107830 with Spades, for example, and I only get contigs under 500 bp length.
I think your data is not well issued for de novo assembly. There are no overlapping regions between the reads. The only overlapping regions are chimeras generated by the PCR. The one who designed the library used a library preparation that is valid for reference alignment of the reads but not for de novo assembly. At least in the run that I have analyzed. I guess it will be the same in all of them.
The genome size approximately 19kb in size