Evaluating the Genome aseemblies
3
0
Entering edit mode
8.5 years ago
EVR ▴ 610

Hi,

I have two genomes from same non-model organism generated at different times. The version 1 has ~43000 scaffolds while the version2 has ~ 15000 scaffolds. The N50 score for version was 40000 while the N50 score the recent version was 78000. Is there any other way to check the which Genome(version1 or version 2) is better ? Is there any bioinformatics tool to evaluate the features like AT richness, repetitive transcripts ina genome?

Also I am unable to understand the concept behind the N50 measure score. It would be great if someone explain

Kindly guide me.

RNA-Seq genome N50 • 1.9k views
ADD COMMENT
1
Entering edit mode

Have you looked at QUAST.

ADD REPLY
0
Entering edit mode

I will look at it now.

ADD REPLY
1
Entering edit mode
8.5 years ago

The Assemblathon 2 paper includes a variety of metrics for evaluating assembly quality.

ADD COMMENT
0
Entering edit mode
8.5 years ago
Fabio Marroni ★ 3.0k

Regarding N50, I suggest you read some introductory material, such as this.

This said, I am not a big fan of N50 as a quality metric for genome assemblies (or rather, as the UNIQUE quality metric). Several work has been performed on this issue and I suggest you to give a look to this paper.

ADD COMMENT
0
Entering edit mode
8.5 years ago
igor 13k

You have to be very careful with individual metrics such as N50, which is why people generally publish a full table of metrics for their assembly. See previous (6 years ago!) discussion here: What Does The "N50" Mean? and How To Assess The Quality Of An Assembly? (Is There No Magic Formula?)

As genomax2 already mentioned in the comments, I would suggest looking into QUAST (Quality Assessment Tool for Genome Assemblies): http://quast.sourceforge.net/quast.html

It's a nice tool to visualize your assembly quality without worrying about specific metrics.

Also, I see you tagged the question as "rna-seq". In that case, all the metrics are going to be different than for genome assembly, since you are expecting thousands of relatively short transcripts as opposed to a few long chromosomes.

ADD COMMENT

Login before adding your answer.

Traffic: 2035 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6