Question

Longer scaffolds from multiple eukaryote genome assemblies

0

Entering edit mode

7.1 years ago

Eric Normandeau 11k

I have two fly genomes from a species for which there are no other genomes available. One genome has been assembled from PacBio reads (N50=~400,000bp) and one from 10X (N50=~250,000bp). The genome is about 250-300Gb long.

I would like to use the scaffolds from both these genomes to create an assembly with longer scaffolds.

I have tried metassembler (https://sourceforge.net/projects/metassembler/) but it requires mate pairs to find the correspondences between the assemblies and I do not have such paired-end reads.

What tools would you recommend to produce longer scaffolds from multiple assemblies?

EDIT:

Here is a list of software I am presently considering:

GARM: http://garm-meta-assem.sourceforge.net/
Camsa: https://cblab.org/camsa/
OPERA-LG: https://sourceforge.net/projects/operasf/files/?source=navbar

genome scaffolding • 3.1k views

ADD COMMENT • link updated 6.9 years ago by harishk0201 ▴ 130 • written 7.1 years ago by Eric Normandeau 11k

1

Entering edit mode

what about FALCON?

ADD REPLY • link 7.1 years ago by Buffo ★ 2.4k

0

Entering edit mode

Is FALCON supposed to be able to merge different assemblies produces by different technologies?

ADD REPLY • link 7.1 years ago by Eric Normandeau 11k

1

Entering edit mode

It is able to assemble long sequences from PacBio or miION, I don't think that you can find a specific software for doing exactly what you are looking for. Longer scaffolds from scaffolds? or even if you find a software to do that I think you will need lots of further karyotype validations to use your final sequences.

ADD REPLY • link 7.1 years ago by Buffo ★ 2.4k

1

Entering edit mode

How about GARM?

ADD REPLY • link 7.1 years ago by Sej Modha 5.3k

0

Entering edit mode

Yes, I am looking at GARM. See my edit above.

ADD REPLY • link 7.1 years ago by Eric Normandeau 11k

1

Entering edit mode

The genome is about 250-300Gb long.

Please keep that in mind when recommending software. What kind of organism is that? Is ploidy a contributor?
Software list from Omicstools.

ADD REPLY • link 7.1 years ago by GenoMax 148k

0

Entering edit mode

The Omicstools list is where I found GARM and Camsa. I sifted through the list and kept a few that looked promising. These two are my best bet for now.

The fly is diploid. The genomes were not assembled from a double haploid individual.

ADD REPLY • link 7.1 years ago by Eric Normandeau 11k

1

Entering edit mode

There are a couple others mentioned in this past thread.

ADD REPLY • link updated 7.1 years ago by Eric Normandeau 11k • written 7.1 years ago by GenoMax 148k

0

Entering edit mode

Thanks. PBJelly has already been run on the PacBio assembly using the 10X reads but I never heard of OPERA-LG. I'll check it out.

ADD REPLY • link 7.1 years ago by Eric Normandeau 11k

score 1 · Answer 1 · 2017-11-21

1

Entering edit mode

7.1 years ago

Istvan Albert 102k

I would imagine that you need to look outside the 'classic" field of high throughput sequencing. You most likely need a long read assembler that works off end-overlaps rather than the de Bruijn graph type of assemblers.

For example this (I found this as a search so I can't comment on its applicability)

https://github.com/isovic/racon

ADD COMMENT • link 7.1 years ago by Istvan Albert 102k

0

Entering edit mode

So basically treat contigs and scaffolds as long reads? That would mean VERY low coverage, on the order of 1 to 2. I'll explore this avenue but something tells me the assemblers are going to struggle with such a low coverage.

ADD REPLY • link 7.1 years ago by Eric Normandeau 11k

score 1 · Answer 2 · 2017-11-21

1

Entering edit mode

7.1 years ago

Sergey Naumenko ▴ 400

Hi Eric!

Maybe you have already done this, I'd align the two genomes first to see a synteny map. And depending on what you see, I'd plan the assembly. For the alignment Mummer (http://mummer.sourceforge.net/) may help, or another tool.

Sergey

ADD COMMENT • link 7.1 years ago by Sergey Naumenko ▴ 400

0

Entering edit mode

I've used Synima (https://github.com/rhysf/Synima) to generate a synteny map (relatively painlessly for individual eukaryotic chromosomes). It might be worth a try. For annotation input, I've used MAKER2 output: CDS=transcripts from MAKER2, PEP=proteins from MAKER2, gff3=gff files from MAKER2 (http://www.yandell-lab.org/software/maker.html).

ADD REPLY • link 6.9 years ago by jean.elbers ★ 1.7k

score 0 · Answer 3 · 2018-02-10

0

Entering edit mode

6.9 years ago

harishk0201 ▴ 130

Hey Eric,

Try Quickmerge : https://github.com/mahulchak/quickmerge

But are you sure that these genomes are in Gbs rather than Mbs? Seems a bit tad too much.

You can try HaploMerger2 as well.

ADD COMMENT • link 6.9 years ago by harishk0201 ▴ 130