Question

How To Assemble Chloroplast Genome?

3

Entering edit mode

12.6 years ago

Biomonika (Noolean) 3.2k

I have got 100k of 454 reads and I am about to do de novo assembly of chloroplast genome. Therefore, I would like to ask:

1, What are the caveats of such effort? (Like for example IR regions, which map on each other) 2, Which tools or approaches would you recommend to me?

Thanks a lot for any advice.

genome assembly • 9.7k views

ADD COMMENT • link updated 22 months ago by Ram 44k • written 12.6 years ago by Biomonika (Noolean) 3.2k

0

Entering edit mode

Hi ... I have about 12GB data based on whole genome seq. by Illumina Seq.. I tried to assemble (soap denovo) 50% of the data by Genious software, but I couldn't get big contigs, the larger contig was about 70KB. so because I want to assemble complete chloroplast and mitochondria seq. this is not a proper way. do you have any idea or opinion what I have to change in my assembler (Soap denovo) to get larger contigs. Thanks

ADD REPLY • link updated 22 months ago by Ram 44k • written 9.5 years ago by river202002 • 0

0

Entering edit mode

This is not an answer to the above question. You should ask this as a separate question.

ADD REPLY • link updated 22 months ago by Ram 44k • written 9.5 years ago by VS ▴ 740

score 5 · Answer 1 · 2013-09-30

5

Entering edit mode

11.2 years ago

SES 8.6k

I have assembled many chloroplast genomes and this is something we are doing on a pretty large scale now to complement other phylogenomics approaches. First, I don't think 100k reads will give you enough coverage to assemble a complete genome, but there are other good options. It is likely that you have other sequence data such as WGS reads or some kind of sequence capture data. In my experience, all sequence data sets (in plants), whether from targeted or shotgun approaches, will contain chloroplast and mitochondria fragments. You should be filtering this data anyway as it is a source of contamination, but these filtered reads can actually be combined with other data and used to assemble the chloroplast genome.

One very important step is to calculate the estimated coverage of the genome you are assembling, and you can get an estimate of the coverage by mapping to a closely related species. Picking the appropriate coverage cutoff will lead to a more complete and contiguous assembly. The most common mistake I see people making is trying to assemble with crazy high coverage (>1000X), and this does not give good results (and takes a longer to execute).

I don't think it is possible to resolve the IR regions de novo (at least, in my experience), but it can be done by using a reference.

My advice would be assemble your genome with Newbler or MIRA as Leonor suggested, then use ABACUS to order your contigs relative to the reference. That will help you fill in the gaps and transfer annotations. The caveat with this approach is that you need a reference from a closely related species because you are assuming the same order. Depending on the species being compared, this is probably a safe assumption because chloroplast genomes evolve more slowly than nuclear genomes.

ADD COMMENT • link 11.2 years ago by SES 8.6k

0

Entering edit mode

Is this the ABACAS software you recommend? http://www.ncbi.nlm.nih.gov/pubmed/19497936 Thanks.

ADD REPLY • link 11.2 years ago by Biomonika (Noolean) 3.2k

1

Entering edit mode

Yep, that's the one. The project is on sourceforge.

ADD REPLY • link 11.2 years ago by SES 8.6k

0

Entering edit mode

I have 3GB data for one chloroplast genome, is it too big? Could you tell me how big should I sequence?

ADD REPLY • link 9.6 years ago by wpwupingwp ▴ 120

0

Entering edit mode

I'm not sure if you mean 3 gigabases or a 3 gigabyte file, but a chloroplast genome is typically about 150 kb. You can use that as a guide to figure out how much coverage you have. If your coverage is really high (e.g., >200X), then I would down sample the data.

ADD REPLY • link 9.6 years ago by SES 8.6k

0

Entering edit mode

I have 3GB data for one sample, is it too high? Does it means I should drop (randomly?) extra data to get the good assembly results?

I will continue to sequence some chloroplast genomes, could you please tell how how many data should I get for each genome, like 300M?

Hope such lots of questions would not bother you.

ADD REPLY • link 9.6 years ago by wpwupingwp ▴ 120

0

Entering edit mode

No worries, I don't mind answering questions. Yes, you should down sample the data randomly to achieve the desired coverage. Think in terms of X-coverage of the genome because that makes more sense, and in that case, I would try a few assemblies between 60X and 200X to see which is best. Likely, the best assembly will be in that range.

ADD REPLY • link 9.6 years ago by SES 8.6k

0

Entering edit mode

How would you down a sample? Digital normalization using C=60? Or randomly sampling X number of reads? Thank you.

ADD REPLY • link 9.0 years ago by int11ap1 ▴ 490

0

Entering edit mode

For reference, I created an application called Chloro to make the process of assembling chloroplast genomes a bit easier. There are some nice features and it worked pretty well for us. There are also a couple of things I would like to improve if I have the need to do this work again in the future, but I'm too busy right now to tinker with performance/accuracy of side projects. Maybe it will help someone.

ADD REPLY • link 9.3 years ago by SES 8.6k

score 4 · Answer 2 · 2012-05-02

100k of 454 reads is not a huge assembly, so I would try starting with a basic procedure and then see which problems arise and try to improve the assembly. I would first use the proprietary Roche assembler (aka Newbler) which you can download here, this would give you an idea of the length of contigs you can achieve:

http://454.com/contact-us%5Csoftware-request.asp

I never assembled a chloroplast, but concerning repetitive segments, I have had a very good experience with the MIRA assembler:

http://www.chevreux.org/projects_mira.html

Depending on the type of repeats (very small ones or rather large regions), the approach to deal with them will be different. Could you give us some information on the characteristics you are expecting for your repeats, so that we can help you there?

score 1 · Answer 3 · 2015-05-14

1

Entering edit mode

9.5 years ago

h.mon 35k

You could try MITObim, which uses MIRA to bait reads and assemble them, starting from a close reference or even a small contig starter.

ADD COMMENT • link 9.3 years ago by h.mon 35k

Ram · Answer 4 · 2015-06-21

1

Entering edit mode

9.4 years ago

nicolasdierckxsens ▴ 40

I developed a denovo assembler specially for chloroplasts and mitochondria. It can assemble a chloroplast in one contig within one hour from whole genome illumina data. I compared the results with MIRA and MITObim and it seems the quality is higher and it always assembles the whole chloroplast and only the chloroplast. As an output you get two fasta files, the only difference between them is the orientation of the region between the inverted repeats. If you blast both files against a reference you can select the correct orientation. I can send the link where you can download the tool, once it's online available.

ADD COMMENT • link 9.4 years ago by nicolasdierckxsens ▴ 40

0

Entering edit mode

Please let us know,where to find your chloroplast & mitochondrial genome assembler.

ADD REPLY • link 9.4 years ago by lutz.fr ▴ 10

0

Entering edit mode

I am still testing it and have to make some more adjustments, but will try to make it online available as soon as possible. If you send me illumina reads, I could also do it quickly and send the results. I have to test it on different datasets anyway

ADD REPLY • link 9.4 years ago by nicolasdierckxsens ▴ 40

0

Entering edit mode

Dear friend,I have tested many assemblers to assembly my chloroplasts data,such as MISA, MITObim, velvet, SPAdes..,but the results are not very well.So I am very interested with your assembler specially for chloroplasts and mitochondria,can you tell where I can get it, thank you very much.

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 9.3 years ago by xinfengya • 0

0

Entering edit mode

Hi, sorry didn't see your reply. I am currently adjusting it for whole genome assembly, but I could post the plastids version already. I didnt publish it yet, but will ask when I can put it online..

But it's easy to work with, you can run it on your laptop and don't need any installation

ADD REPLY • link 9.2 years ago by nicolasdierckxsens ▴ 40

0

Entering edit mode

Hi, Nicolas!

I'm interested with your assembler specially for chloroplast. Do you have idea when you publish?

Thank you so much!

ADD REPLY • link 9.2 years ago by lilian.o.machado • 0

1

Entering edit mode

I will put it online next month, I will send the link when it's done.

It works now for paired-end illumina data, but I could add other platforms. Does anyone have an idea what other kind of reads are used for chloroplast assemblies? And what's usually the available coverage? If anyone has experience in it, please let me know then I will try to modify my tool...

ADD REPLY • link 9.2 years ago by nicolasdierckxsens ▴ 40

0

Entering edit mode

Hi nicolas,

I am interested in your software. Can i get the link now to download?

ADD REPLY • link 8.6 years ago by vicky.bioinfo • 0

0

Entering edit mode

Hi Nicolas, I'm interested with your assembler for chloroplast. How can I get it? Thanks

ADD REPLY • link 8.4 years ago by mssmfeitosa • 0

1

Entering edit mode

Hi, Sorry forgot to post the link:

https://github.com/ndierckx/NOVOPlasty

So it works for Illumina paired end reads derived from whole genome data (no capture DNA) Best not to trim nor filter the reads, use the raw data!! I will upload the tool very soon, but if you mail me through the github, I send it to you the same day. If the chloroplast hasn't got many repetitive regions, it should assemble in one circular contig (assembled over 80 chloroplast genomes in once circular contig and within 30 min)

ADD REPLY • link 8.4 years ago by nicolasdierckxsens ▴ 40