I have several fasta files of assemblies from different samples. Is there a recommended method for merging de-novo assemblies? I have 88 assemblies and 2 of which are of seemingly high quality. The quast results are shown below for all of the assemblies. Many of the reads are short and I feel like they could be collapsed down from some of the contigs from the other assemblies. Are there recommended methods for collapsing all of the assemblies into a consensus assembly? I stumbled across metassembler
but haven't heard much about it. Any advice would greatly appreciated.
Do you have the original fastqs for all the assemblies?
Yes, I have the original fastq files which include R1, R2, and singletons.
Yep, Damiens answer was where I was headed. Just concatenate all you R1s and all your R2s and assemble that.
Depending on the quality, you may not want to use all of them. If you have some which are lower quality, there’s no point ‘tainting’ your other reads with them.
88 lots of fastqs may also lead to too much depth, in which case you may need to downsample.
Do you have 88 assemblies of the same organism?
Yes, I have 88 de-novo assemblies based on the same reference.
It is implied, but it is not clear to me, that you have 88 sequencing samples, but these samples are all from the same reference. Could you clarify your experimental design?
I will list some possibilities from my incomplete understanding of your question:
Do you have one fastq dataset, which you assembled several times, with different methods and parameters, to get 88 assemblies?
Or do you have 88 fastq datasets, all from different isolates of the same organism, and assembled these 88 isolates separately?
Or (finally) do you have 88 fastq datasets, all from the same isolate, and assembled these 88 isolates separately?
Yes, I have 88 different metagenomic samples. I've mapped to a collection of taxa that I'm interested in (very small list) to get all species in the genus. I am assuming that there will be slight differences in community from each sample. I've tried coassembling these but as mentioned in another commenter, I had to subsample and that decreased the amount of data I had for assembling rare species. I have a feeling that there will be overlaps in contigs among different sample sets. Are there any tools that can look for this and extend contigs that have high overlap?
Try some of these perhaps?
https://omictools.com/assembly-reconciliation-category
I can’t personally speak to any of them as I’ve never tried what you’re attempting.
is there any particular reason why you constructed 88 assemblies? as Damian Kao says, why not just simply do one assembly with all data?
Hi O.rka,
Have you tried Metassembler on your datasets ?
I assembled 7 different genomes(fish) with differ kmer values (Abyss, Spades, Velvet).
Using CD-HIT remove the redundant contigs.
Generated the less redundant contigs,
Can I use Metassembler to merge the assemblies ?
You're definitely going to get an answer if you merge the assemblies but TBH I steer clear of this type of workflow because it can introduce artifacts if not done very carefully. I haven't tried meta-assembler yet so I can't comment.
Check out this, IMAP. https://github.com/jkimlab/IMAP.