Re-assembly assembled genomes
4
1
Entering edit mode
2.8 years ago
Sissi ▴ 60

Hi there,

I'm planning a comparative genomics analysis between bacterial genomes. I've downloaded them from NCBI and found out that many of them are scaffolds with hundreds of contigs, some other are completely assembled in one single chromosome. Today I've been asked really simple questions that made me really confused... 1) Do you think that I can use the complete genome as Reference and try to assembly the other fragmented genomes again? Thus, giving the fasta genome as input to some assembler tools and using the fasta Reference as Ref? (The ideal would be getting the fastq files from NCBI as well but not all of them are available).

2) A reorder with Mauve would improve somehow the genome? Of course it won't reduce the number of contigs, but would instead increase the annotation?

3) Furthermore, I noticed that some genomes are on NCBI Assembly, some other just on NCBI Genomes.. Why is that?

Any suggestion/explanation please? Thanks, Silvia

assembly NCBI genomes • 2.0k views
ADD COMMENT
0
Entering edit mode

Hi Silvia,

Can you give me an example of a genome that is in Genome but not Assembly?

ADD REPLY
0
Entering edit mode

Hi there,

If you look for Pseudomonas avellanae here you can find 16 entries, but if you look here you get only 14. R2sc214 and R25260 are missing.

ADD REPLY
1
Entering edit mode

Because R2sc214 and R25260 are marked as Anomalous assemblyand therefore excluded form RefSeq and GenBank

ADD REPLY
0
Entering edit mode

Yes, exactly what's written in the FAQ ;)

ADD REPLY
0
Entering edit mode

If I understand correctly you are wondering if you can use the contiguous reference-like assemblies to improve the less contiguous assemblies?

So there are tools for this, you want to look at reference alignment/based/guided/etc assembly. However it may depend on what you want to look at in your comparisons...

For example, the unassembled regions are most likely complex regions which will be probably be highly variable, so I am guessing that by using the reference guided assembly, for regions you 'solve' you are potentially just introducing reference-bias. Just a guess, I have not checked this or read about it.

Your alternative, which I think you touch on and perhaps is better, is just using your reference genome to scaffold the less contiguous assemblies. This once again will create reference-biased structured scaffolds but at least the spaces with 'N's will allow you to keep track of this missing info.

ADD REPLY
0
Entering edit mode
2.8 years ago

It is likely that, by making use of all data deposited for an organism, you could improve upon just about every fragmented assembly of that organism.

But, it is unlikely that you easily/readily turn each into a reliable, high-quality assembly - in my experience, most fragmented assemblies are such because of the underlying data had lots of issues.

It would probably take substantial effort and manual curation and investigation to fix assemblies, and each would need to be dealt with individually.

In the end the goals of your project matter the most. Is a 5% improvement an acceptable result, is so then you can probably automate. Do you need to reliably identify the differences between strains? That would be a far more difficult task.

ADD COMMENT
0
Entering edit mode
2.8 years ago

If you decide not to try to improve your references for the reasons given by others (automated tools such as medusa do exist, but will naturally introduce reference bias), then I would go for an orthologue or pangenome approach.

There are many tools, but I would recommend

Good luck

ADD COMMENT
0
Entering edit mode
2.8 years ago
liorglic ★ 1.4k

You can try RagTag for scaffolding/ordering of fragmented genomes based on a reference genome.

ADD COMMENT
0
Entering edit mode
2.8 years ago
LauferVA 4.5k

Sissi - what exactly are you trying to do ?

Check this out https://www.biorxiv.org/content/10.1101/2021.09.28.462107v1.full.pdf would this help?

ADD COMMENT
0
Entering edit mode

Silvia - i know my answer might seem random, but reeally, really read this. i think it could help you

ADD REPLY

Login before adding your answer.

Traffic: 2007 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6