Entering edit mode
8.5 years ago
nguyendai1992
•
0
Hi everyone I have a set data with quality is greater than 20. I want to know the best way to assemble this data ? Thanks
Hi everyone I have a set data with quality is greater than 20. I want to know the best way to assemble this data ? Thanks
I agree with the sentiment of @b.nota, however SPADES might be good to look into
If you want your assembly to be of high quality, start over and generate better data. You can't expect magic and some bio-informatical hocus pocus will solve your problem.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
If your input is of low quality, you can't expect that your output will be of good quality...
If you believe in Alchemy, and want to try it anyway, you could try something like MIRA or velvet (don't know what platform you have used).
I imagine for low quality data any overlap layout consensus based assembler is better than k-mer/De Bruijn stuff..
First of all information in the title and the question don't match.
Title says "low quality" data but the body of the post says "greater than 20" (which I will assume means Q20 or greater). If it is the latter then there is no problem but if it is the former then assembly may still be fine.
We don't have enough reliable information here to say anything about the final quality of assembly.
Sorry, My english is not good and I can't express exactly the question :D. I'm beginer. I've read some paper and I see that the most of data is Q30 or greater and my data is Q20, i think its quality is low. By way, can you tell me the thresold of quality to assemble. Thanks
I will assume that you have illumina data since you have not told us what kind. If it is not illumina data then the following may not work/apply.
Since this a de novo assembly you may want to trim data that is Q10 or below. If trimming does not leave (more than 10-15x raw base of sequence based on the genome size you expect) you could try doing an assembly but you be warned that the results may be poor and you may need to start over. Use SPAdes as recommended below.
If there is a related genome available @NCBI you can always try to align your data and see what you get. As long as the organisms are reasonably related you may be able to map 80%+ of your data. Hope this helps.