I have two fly genomes from a species for which there are no other genomes available. One genome has been assembled from PacBio reads (N50=~400,000bp) and one from 10X (N50=~250,000bp). The genome is about 250-300Gb long.
I would like to use the scaffolds from both these genomes to create an assembly with longer scaffolds.
I have tried metassembler (https://sourceforge.net/projects/metassembler/) but it requires mate pairs to find the correspondences between the assemblies and I do not have such paired-end reads.
What tools would you recommend to produce longer scaffolds from multiple assemblies?
EDIT:
Here is a list of software I am presently considering:
It is able to assemble long sequences from PacBio or miION, I don't think that you can find a specific software for doing exactly what you are looking for. Longer scaffolds from scaffolds? or even if you find a software to do that I think you will need lots of further karyotype validations to use your final sequences.
The Omicstools list is where I found GARM and Camsa. I sifted through the list and kept a few that looked promising. These two are my best bet for now.
The fly is diploid. The genomes were not assembled from a double haploid individual.
I would imagine that you need to look outside the 'classic" field of high throughput sequencing. You most likely need a long read assembler that works off end-overlaps rather than the de Bruijn graph type of assemblers.
For example this (I found this as a search so I can't comment on its applicability)
So basically treat contigs and scaffolds as long reads? That would mean VERY low coverage, on the order of 1 to 2. I'll explore this avenue but something tells me the assemblers are going to struggle with such a low coverage.
Maybe you have already done this, I'd align the two genomes first to see a synteny map. And depending on what you see, I'd plan the assembly. For the alignment Mummer (http://mummer.sourceforge.net/) may help, or another tool.
I've used Synima (https://github.com/rhysf/Synima) to generate a synteny map (relatively painlessly for individual eukaryotic chromosomes). It might be worth a try. For annotation input, I've used MAKER2 output: CDS=transcripts from MAKER2, PEP=proteins from MAKER2, gff3=gff files from MAKER2 (http://www.yandell-lab.org/software/maker.html).
what about FALCON?
Is FALCON supposed to be able to merge different assemblies produces by different technologies?
It is able to assemble long sequences from PacBio or miION, I don't think that you can find a specific software for doing exactly what you are looking for. Longer scaffolds from scaffolds? or even if you find a software to do that I think you will need lots of further karyotype validations to use your final sequences.
How about GARM?
Yes, I am looking at GARM. See my edit above.
Please keep that in mind when recommending software. What kind of organism is that? Is ploidy a contributor?
Software list from Omicstools.
The Omicstools list is where I found GARM and Camsa. I sifted through the list and kept a few that looked promising. These two are my best bet for now.
The fly is diploid. The genomes were not assembled from a double haploid individual.
There are a couple others mentioned in this past thread.
Thanks. PBJelly has already been run on the PacBio assembly using the 10X reads but I never heard of OPERA-LG. I'll check it out.