How To Decide Which Genome Assembly Is Better?
2
-1
Entering edit mode
11.8 years ago
thiagomafra ▴ 70

Hi everybody,

I've two draft's of yeast genomes, one assembled by Velvet and other assembled by CLC Workbench, with the following report:

Assembled by Velvet

Length      11354767
GC%        38.22%
N's            1579
Scaffolds   214
N50           360969
Min            302
Max           705532

Assembled by CLC

Length       11533444
GC%         37.22%
N's             2490
Scaffolds    851
N50           72755
Min            301
Max           215625

My question: The N50 value is the principal parameter for determine which the best assembling? This case, the draft by Velvet.

assembly denovo velvet • 7.7k views
ADD COMMENT
1
Entering edit mode

Take a look at How To Assess The Quality Of An Assembly? (Is There No Magic Formula?) for some other ways to compare the assemblies. The Velvet assembly looks more contiguous and has fewer Ns but there are some additional comparisons you could make (discussed in that previous question) which might help you decide.

ADD REPLY
0
Entering edit mode

I would also have a look at FRCBam ( http://arxiv.org/abs/1210.1095) I had good experiences with it.

ADD REPLY
3
Entering edit mode
11.8 years ago
Ryan Thompson ★ 3.6k

Assuming that the scaffolds have all been assembled correctly, the Velvet assembly is clearly superior. However, you should align your scaffolds to the reference genome to verify that there are no assembly errors. When tool A has a larger N50 than tool B, it is always a question of whether tool A is superior because it correctly assembled more, or whether tool B is superior because ti correctly rejected more incorrect assemblies. You can't know which one is true unless you align to the reference genome.

(of course, the reference genome can also have errors, but these should show up consistently against both of your assemblies.

ADD COMMENT
1
Entering edit mode

If the alignments look similar, Assessing The Quality Of De Novo Assembled Data with a couple of tools that you can use to judge the quality of your assembly, I've had good experiences with hagfish (albeit that one goes into much more detail), have never tried QUAST but that looks like it could help you out more.

ADD REPLY
1
Entering edit mode
11.1 years ago
Hayssam ▴ 280

Hi,

Confirming Philipp suggestion, you should have a look at the QUAST tool. We used it extensively for our publication about finishing bacterial genome assemblies by mixing various assemblers results and QUAST outperformed in terms of features, stability and customization. It can run in reference mode (even if you only have a closely related specie) or in no reference mode. In the specific case of bacterial genome (might hold for other kingdom), we showed in our MIX publication (there if you're interested) that N50 does a good job at picking the best assemblies in no reference mode.

ADD COMMENT

Login before adding your answer.

Traffic: 2723 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6