Question

improving de novo genome assembly

1

Entering edit mode

9.5 years ago

xvazquezc ▴ 10

Hi,

I have a couple of fungal genomes that I'm reassembly from scratch as I didn't realise in a first time the amount of Illumina adapters still present in my reads.

I have assembled them with Velvet and iterate the k-mer length to get the optimum assembly based on the output parameters produced by abyss-fac. I recently read that the assemblies can be improved without further sequencing by at least a couple of different methods:

map the reads against the assembly, extract the "properly paired" reads and reassembly them with the same kmer length. Take a look here
Use specific software for this such REAPR(?)

I proceeded with #1 and while one of the genome re-assembly resulted in the exact same assembly parameters, the other changed quite a bit (top initial, bottom reassembly):

n       |n:500  |n:N50  |min    |N80    |N50    |N20    |E-size |max    |sum    |name
------  |------ |------ |------ |------ |------ |------ |------ |------ |------ |------
7290    |6860   |1122   |503    |4914   |11504  |22431  |14693  |124406 |43.73e6        |T2paper/velvet/k169/contigs.fa
10638   |8598   |1437   |500    |3980   |9021   |17417  |11669  |124406 |43.75e6        |T2paper/velvet/rek169/contigs.fa

So, the question...

is this step common?
Is there any easy way to compare them side by side or to evaluate the assemblies without relying in those numbers?

Thank you in advance,
Xabier

novo de Assembly re-assembly • 2.9k views

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 9.5 years ago by xvazquezc ▴ 10

0

Entering edit mode

I think step #1 from the posted link does not refer to extracting the reads and reassembling them, but rather estimating the fragment length from Paired End reads, and then reassembling again with that information.

Edit: They do have this "Now we created a fairly good assembly, but lets see if we can do it better. Lets try to map the reads to the assembly and then only use mapped reads for another assembly.", but like others said, I don't think this will help the assembly.

ADD REPLY • link 9.5 years ago by Adrian Pelin ★ 2.6k

Ram · Answer 1 · 2015-10-23

0

Entering edit mode

9.5 years ago

Brian Bushnell 20k

You should trim your adapters with an adapter-trimming tool like BBDuk first. You can get a rough evaluation of your assembly quality with tools like Quast; generally, the more long genes (1500bp+ and 3000bp+) are called, the better the assembly.

ADD COMMENT • link updated 5.4 years ago by Ram 45k • written 9.5 years ago by Brian Bushnell 20k

0

Entering edit mode

That's why I'm redoing the assemblies. The first time I did it I use SolexaQA but I didn't search for the adapters.

I use Trim galore! for that. It's a quality and trim adapter.

ADD REPLY • link 9.4 years ago by xvazquezc ▴ 10

score 0 · Answer 2 · 2015-10-23

0

Entering edit mode

9.5 years ago

Rayan Chikhi ★ 1.6k

No, I don't think that this step #1 is common. In general, reassembling using only the reads which properly mapped to contigs is unlikely to give you a better assembly.

ADD COMMENT • link 9.5 years ago by Rayan Chikhi ★ 1.6k

score 0 · Answer 3 · 2015-10-23

One colleague of mine has been trying to close a 6Mb bacterial genome for years. It happened that this genome had too many repeated sequences. And this means trouble

He eventually have closed the circle by running a PacBio sequencing and running an hybrid assembly

And this is something you should be consider very seriously