Question

Comparing genome assemblies of the same animal

0

Entering edit mode

6.2 years ago

a.rex ▴ 350

I have a genome of a species, which I have annotated in house. The N50 for this genome is around 41,000.

I have another, better genome, of the same species, which I have also annotated in house. The N50 for this is 4,000,000.

I suspect my poorer genome has split and incomplete transcripts, which gets resolved with my better genome.

Does anyone have any tips on how I can find this split instances that are resolved with the better genome?

Assembly • 1.5k views

ADD COMMENT • link 6.2 years ago by a.rex ▴ 350

0

Entering edit mode

I have two protein fasta files of the same animal. One is an assembly with a small N50; the other pacbio one has a larger N50.

I wish to blastp the bad genome against the better one; after performing this I want to extract out the query length and hit length. I will then blastp the better one against the worse one.

So when I blast the bad genome against the pacbio one, I should get the query length being much smaller than the hit length. The opposite should be true when I blast the pacbio genome against the bad genome.

How can I extract out the query and hit lengths from the blast results?

ADD REPLY • link 6.2 years ago by a.rex ▴ 350

score 2 · Answer 1 · 2018-09-19

2

Entering edit mode

6.2 years ago

Philipp Bayer 8.7k

Mick Watson came up with this simple test for bacterial genomes: http://www.opiniomics.org/a-simple-test-for-uncorrected-insertions-and-deletions-indels-in-bacterial-genomes/

With an N50 of 4,000,000 I don't expect you to have a bacterial genome :) However I don't see why this simple test wouldn't work when you'd blastp your 'poor' genome's proteins with your 'better' genome's proteins as database