construction of draft genome from the contigs
3
1
Entering edit mode
4.9 years ago

Hi I am working on viral genome and the reads i generated is from ion torrent method. After performing denovo assembly using trinity tool i got contigs. Next step is to construct a draft genome from the contigs. Is there any tools/ softwares to create a draft genome from the contigs. Can anyone please help me..

Assembly • 3.5k views
ADD COMMENT
0
Entering edit mode

The genome size approximately 19kb in size

ADD REPLY
2
Entering edit mode
4.9 years ago
h.mon 35k

There are several assembly pipelines geared towards viral assembly. As viral samples are a mix of host and viral sequences, one key step is "contaminant" removal. Depending on the pipeline (and also if host and viral reference genomes are available, and if viral material has been enriched prior to library preparation), host sequences removal can be performed before or after assembly. Check some of the available pipelines:

Personally, I was able to assemble several complete RNA viral genomes with my local version of the above pipelines:

  1. filter viral reads / filter out host reads
  2. assembly with Trinity (I obtained better results compared to SPAdes)
  3. filter for virus contigs (using Blast+ / DIAMOND, and possibly coverage)
  4. assembly of the selected contigs with CAP3
  5. a possible final step is visualizing the assembly and mapped reads onto it with IGV, for manual corrections
ADD COMMENT
0
Entering edit mode

Thank you for your kind reply.. After performing blast for filtering the viral contigs, what should be the e value? Is there any standard value?

ADD REPLY
1
Entering edit mode
4.9 years ago
onestop_data ▴ 330

A set of contigs for an organism is considered the draft genome already (at least that is what I think). Do you mean to build a scaffold for these contigs - join contigs to improve your draft genome N50?

ADD COMMENT
0
Entering edit mode

Yes exactly. How to build scaffolds from those contigs? Is there any available tools/softwares for this?

ADD REPLY
0
Entering edit mode
4.9 years ago
liorglic ★ 1.5k

Hi,
I have some experience with eukarypte genome assembly from Illumina reads, but I've never worked on viruses or Ion torrent data. Still, a few things to note.
I'm not sure Trinity is the most suitable tool, since it is intended for transcriptome, not genome assembly (and does a great job at it). Why did you choose it? You might be better off with a genome assembler like SPAdes.
The next step after creating contigs is usually scaffolding, in which you create longer, gapped fragments out of your contigs, called scaffolds. To do that you can use paired-end data (preferably long distance, e.g. mate pair) or long reads. SPAdes will perform scaffolding as part of the assembly, if you provide the relevant data.
Achieving a full draft genome (i.e. a single sequence per chromosome) is usually challenging, although maybe with small and simple viral genomes this might be possible. You'll have to use either an existing reference genome of a closely related virus or additional information from genetic, physical or optical maps. I also hear that people interested in viral genomics use ONT sequencing, as it can sometimes sequence the whole genome in one read, and thus no assembly is needed. However, this technology has its own problems and will require you to sequence all over again.
What is the estimated genome size of your virus? to what depth did you sequence it, and with what library?

ADD COMMENT
0
Entering edit mode

This is the link to my dataset - https://www.ncbi.nlm.nih.gov/bioproject/PRJNA309162 and the genome size is about 90 kb.

ADD REPLY
0
Entering edit mode

OK, so if I understand correctly, it looks like you have decent coverage, but no paired-end info of any type. This means you can't really expect to do any scaffolding with just the data you currently have. How about using an existing assembly of another strain like this one? You could use a tool like RaGOO to do that. This approach assumes that your genome is pretty close to the reference. Also note that if you are looking for large SVs, this approach might not be optimal. What is your end goal? Why do you want to perform scaffolding? what's the planned downstream analysis?

ADD REPLY
0
Entering edit mode

After constructing a draft genome, i have planned to analyse the variants, utr analysis and regulatory sites.

ADD REPLY
0
Entering edit mode

Is it possible to use RaGOO tool for viral genome assembly?

ADD REPLY
0
Entering edit mode

I don't see a reason why not, but I really have no idea. The question is - do you really need a chromosome-level assembly? Contigs may be good enough for your analysis. What is your contig N50?

ADD REPLY
0
Entering edit mode

Contig N50 value is 3064.

ADD REPLY
0
Entering edit mode

Hi liorgiic, could I know exactly what run you used to obtain that N50? I would like to make some tests (just for learning). I have tried SRR3107830 with Spades, for example, and I only get contigs under 500 bp length.

ADD REPLY
0
Entering edit mode

I think your data is not well issued for de novo assembly. There are no overlapping regions between the reads. The only overlapping regions are chimeras generated by the PCR. The one who designed the library used a library preparation that is valid for reference alignment of the reads but not for de novo assembly. At least in the run that I have analyzed. I guess it will be the same in all of them.

ADD REPLY

Login before adding your answer.

Traffic: 2061 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6