Question

What solution to assemble genome bacterial with low quality data ?

0

Entering edit mode

8.5 years ago

nguyendai1992 • 0

Hi everyone I have a set data with quality is greater than 20. I want to know the best way to assemble this data ? Thanks

Assembly • 1.3k views

ADD COMMENT • link updated 8.5 years ago by WouterDeCoster 47k • written 8.5 years ago by nguyendai1992 • 0

2

Entering edit mode

If your input is of low quality, you can't expect that your output will be of good quality...

If you believe in Alchemy, and want to try it anyway, you could try something like MIRA or velvet (don't know what platform you have used).

ADD REPLY • link 8.5 years ago by Benn 8.4k

1

Entering edit mode

I imagine for low quality data any overlap layout consensus based assembler is better than k-mer/De Bruijn stuff..

ADD REPLY • link 8.5 years ago by 5heikki 11k

0

Entering edit mode

First of all information in the title and the question don't match.

Title says "low quality" data but the body of the post says "greater than 20" (which I will assume means Q20 or greater). If it is the latter then there is no problem but if it is the former then assembly may still be fine.

We don't have enough reliable information here to say anything about the final quality of assembly.

ADD REPLY • link 8.5 years ago by GenoMax 148k

0

Entering edit mode

Sorry, My english is not good and I can't express exactly the question :D. I'm beginer. I've read some paper and I see that the most of data is Q30 or greater and my data is Q20, i think its quality is low. By way, can you tell me the thresold of quality to assemble. Thanks

ADD REPLY • link 8.5 years ago by nguyendai1992 • 0

0

Entering edit mode

I will assume that you have illumina data since you have not told us what kind. If it is not illumina data then the following may not work/apply.

Since this a de novo assembly you may want to trim data that is Q10 or below. If trimming does not leave (more than 10-15x raw base of sequence based on the genome size you expect) you could try doing an assembly but you be warned that the results may be poor and you may need to start over. Use SPAdes as recommended below.

If there is a related genome available @NCBI you can always try to align your data and see what you get. As long as the organisms are reasonably related you may be able to map 80%+ of your data. Hope this helps.

ADD REPLY • link 8.5 years ago by GenoMax 148k

score 2 · Answer 1 · 2016-06-03

2

Entering edit mode

8.5 years ago

andrew.j.skelton73 6.6k

I agree with the sentiment of @b.nota, however SPADES might be good to look into

ADD COMMENT • link 8.5 years ago by andrew.j.skelton73 6.6k

score 1 · Answer 2 · 2016-06-03

1

Entering edit mode

8.5 years ago

WouterDeCoster 47k

If you want your assembly to be of high quality, start over and generate better data. You can't expect magic and some bio-informatical hocus pocus will solve your problem.