improving de novo genome assembly
3
1
Entering edit mode
9.1 years ago
xvazquezc ▴ 10

Hi,

I have a couple of fungal genomes that I'm reassembly from scratch as I didn't realise in a first time the amount of Illumina adapters still present in my reads.

I have assembled them with Velvet and iterate the k-mer length to get the optimum assembly based on the output parameters produced by abyss-fac. I recently read that the assemblies can be improved without further sequencing by at least a couple of different methods:

  1. map the reads against the assembly, extract the "properly paired" reads and reassembly them with the same kmer length. Take a look here
  2. Use specific software for this such REAPR(?)

I proceeded with #1 and while one of the genome re-assembly resulted in the exact same assembly parameters, the other changed quite a bit (top initial, bottom reassembly):

n       |n:500  |n:N50  |min    |N80    |N50    |N20    |E-size |max    |sum    |name
------  |------ |------ |------ |------ |------ |------ |------ |------ |------ |------
7290    |6860   |1122   |503    |4914   |11504  |22431  |14693  |124406 |43.73e6        |T2paper/velvet/k169/contigs.fa
10638   |8598   |1437   |500    |3980   |9021   |17417  |11669  |124406 |43.75e6        |T2paper/velvet/rek169/contigs.fa

So, the question...

  • is this step common?
  • Is there any easy way to compare them side by side or to evaluate the assemblies without relying in those numbers?

Thank you in advance,
Xabier

novo de Assembly re-assembly • 2.7k views
ADD COMMENT
0
Entering edit mode

I think step #1 from the posted link does not refer to extracting the reads and reassembling them, but rather estimating the fragment length from Paired End reads, and then reassembling again with that information.

Edit: They do have this "Now we created a fairly good assembly, but lets see if we can do it better. Lets try to map the reads to the assembly and then only use mapped reads for another assembly.", but like others said, I don't think this will help the assembly.

ADD REPLY
0
Entering edit mode
9.1 years ago

You should trim your adapters with an adapter-trimming tool like BBDuk first. You can get a rough evaluation of your assembly quality with tools like Quast; generally, the more long genes (1500bp+ and 3000bp+) are called, the better the assembly.

ADD COMMENT
0
Entering edit mode

That's why I'm redoing the assemblies. The first time I did it I use SolexaQA but I didn't search for the adapters.

I use Trim galore! for that. It's a quality and trim adapter.

ADD REPLY
0
Entering edit mode
9.1 years ago
Rayan Chikhi ★ 1.5k

No, I don't think that this step #1 is common. In general, reassembling using only the reads which properly mapped to contigs is unlikely to give you a better assembly.

ADD COMMENT
0
Entering edit mode
9.1 years ago

One colleague of mine has been trying to close a 6Mb bacterial genome for years. It happened that this genome had too many repeated sequences. And this means trouble

He eventually have closed the circle by running a PacBio sequencing and running an hybrid assembly

And this is something you should be consider very seriously

ADD COMMENT

Login before adding your answer.

Traffic: 1970 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6