Question

construction of draft genome from the contigs

0

Entering edit mode

4.8 years ago

treesavictor83 • 0

Hi I am working on viral genome and the reads i generated is from ion torrent method. After performing denovo assembly using trinity tool i got contigs. Next step is to construct a draft genome from the contigs. Is there any tools/ softwares to create a draft genome from the contigs. Can anyone please help me..

Assembly • 3.4k views

ADD COMMENT • link updated 4.3 years ago by Biostar 20 • written 4.8 years ago by treesavictor83 • 0

0

Entering edit mode

The genome size approximately 19kb in size

ADD REPLY • link 4.8 years ago by mathavanbioinfo ▴ 80

score 2 · Answer 1 · 2020-01-28

There are several assembly pipelines geared towards viral assembly. As viral samples are a mix of host and viral sequences, one key step is "contaminant" removal. Depending on the pipeline (and also if host and viral reference genomes are available, and if viral material has been enriched prior to library preparation), host sequences removal can be performed before or after assembly. Check some of the available pipelines:

Personally, I was able to assemble several complete RNA viral genomes with my local version of the above pipelines:

filter viral reads / filter out host reads
assembly with Trinity (I obtained better results compared to SPAdes)
filter for virus contigs (using Blast+ / DIAMOND, and possibly coverage)
assembly of the selected contigs with CAP3
a possible final step is visualizing the assembly and mapped reads onto it with IGV, for manual corrections

score 1 · Answer 2 · 2020-01-23

1

Entering edit mode

4.8 years ago

onestop_data ▴ 330

A set of contigs for an organism is considered the draft genome already (at least that is what I think). Do you mean to build a scaffold for these contigs - join contigs to improve your draft genome N50?

ADD COMMENT • link 4.8 years ago by onestop_data ▴ 330

0

Entering edit mode

Yes exactly. How to build scaffolds from those contigs? Is there any available tools/softwares for this?

ADD REPLY • link 4.8 years ago by treesavictor83 • 0

score 0 · Answer 3 · 2020-01-23

0

Entering edit mode

4.8 years ago

liorglic ★ 1.4k

Hi,
I have some experience with eukarypte genome assembly from Illumina reads, but I've never worked on viruses or Ion torrent data. Still, a few things to note.
I'm not sure Trinity is the most suitable tool, since it is intended for transcriptome, not genome assembly (and does a great job at it). Why did you choose it? You might be better off with a genome assembler like SPAdes.
The next step after creating contigs is usually scaffolding, in which you create longer, gapped fragments out of your contigs, called scaffolds. To do that you can use paired-end data (preferably long distance, e.g. mate pair) or long reads. SPAdes will perform scaffolding as part of the assembly, if you provide the relevant data.
Achieving a full draft genome (i.e. a single sequence per chromosome) is usually challenging, although maybe with small and simple viral genomes this might be possible. You'll have to use either an existing reference genome of a closely related virus or additional information from genetic, physical or optical maps. I also hear that people interested in viral genomics use ONT sequencing, as it can sometimes sequence the whole genome in one read, and thus no assembly is needed. However, this technology has its own problems and will require you to sequence all over again.
What is the estimated genome size of your virus? to what depth did you sequence it, and with what library?

ADD COMMENT • link 4.8 years ago by liorglic ★ 1.4k

0

Entering edit mode

This is the link to my dataset - https://www.ncbi.nlm.nih.gov/bioproject/PRJNA309162 and the genome size is about 90 kb.

ADD REPLY • link 4.8 years ago by treesavictor83 • 0

0

Entering edit mode

OK, so if I understand correctly, it looks like you have decent coverage, but no paired-end info of any type. This means you can't really expect to do any scaffolding with just the data you currently have. How about using an existing assembly of another strain like this one? You could use a tool like RaGOO to do that. This approach assumes that your genome is pretty close to the reference. Also note that if you are looking for large SVs, this approach might not be optimal. What is your end goal? Why do you want to perform scaffolding? what's the planned downstream analysis?

ADD REPLY • link 4.8 years ago by liorglic ★ 1.4k

0

Entering edit mode

After constructing a draft genome, i have planned to analyse the variants, utr analysis and regulatory sites.

ADD REPLY • link 4.8 years ago by treesavictor83 • 0

0

Entering edit mode

Is it possible to use RaGOO tool for viral genome assembly?

ADD REPLY • link 4.8 years ago by treesavictor83 • 0

0

Entering edit mode

I don't see a reason why not, but I really have no idea. The question is - do you really need a chromosome-level assembly? Contigs may be good enough for your analysis. What is your contig N50?

ADD REPLY • link 4.8 years ago by liorglic ★ 1.4k

0

Entering edit mode

Contig N50 value is 3064.

ADD REPLY • link 4.8 years ago by treesavictor83 • 0

0

Entering edit mode

Hi liorgiic, could I know exactly what run you used to obtain that N50? I would like to make some tests (just for learning). I have tried SRR3107830 with Spades, for example, and I only get contigs under 500 bp length.

ADD REPLY • link 4.2 years ago by juanjo75es ▴ 130

0

Entering edit mode

I think your data is not well issued for de novo assembly. There are no overlapping regions between the reads. The only overlapping regions are chimeras generated by the PCR. The one who designed the library used a library preparation that is valid for reference alignment of the reads but not for de novo assembly. At least in the run that I have analyzed. I guess it will be the same in all of them.

ADD REPLY • link 4.2 years ago by juanjo75es ▴ 130