Entering edit mode
10.0 years ago
muralis.bio
•
0
Hi,
I have 454 (single-end) data with 20x coverage for a plant genome with 1.2 Gb size. Can anyone give me pointer on what should be the minimum read length i should use for a de novo assembly run by Celera Caboge
Thanks in advance...
It will be difficult to tell without looking at read length distribution.
following is my read length distribution...
greater than (>) small or equalto <= No of seq 0 60 163078 60 120 1320262 120 180 1128226 180 240 1101897 240 300 1257487 300 400 3928630 400 500 4911455 500 600 2488924 600 700 2936272 700 800 3610518 800 900 4271892 900 1000 5989034 1000 1100 5135502 1100 1200 513789 1200 1300 11654 1300 1400 329 1400 1500 102 1500 1600 75 1600 1700 88 1700 1800 196
A histogram would be good to represent read length distributions. In general, you need to select a k-mer value while assembling the reads, hence remove the reads that are smaller than twice the k-mer length. But you need to try multiple times to get the best assembly.