Entering edit mode
7.0 years ago
popayekid55
▴
110
Hi all,
I am very new to genome assembly. I have got 4 data for a genome of size 50M. Data what i have are, illumina short insert lib (200-500), long insert lib (300-600), mate pair (3-5kb) and nanopore (yet to receive). Kindly suggest which assemblers to use?? which data to use what stage (assembly, scaffolding or gapclosure)?? how do i get best kmer apart from kmergenie? Any other kmer identifier?
Detailed input is much appreciated.
Right now i am thinking of spades and idba_ud for assembly using both short and long insert together. Further need some assistance.
Thank you
Please elaborate on the species, or at least kingdom of life to which your species belongs. Also, does a reference genome already exist for this species?
There are many options for assembly, including:
Data what I have is 150 x 2. Data is for green algae. I thought velvet is for max read length of 100 x 2. I have already started with abyss .. K from kmergenie.. Also trying idba-ud as it is a multi k mer assembler. What other options ??
No reference available..
Velvet can tolerate any read lengths. It was designed to cater for short-read assembly, but it accepts a mixture or either of both. To be honest, I don't believe that any genome can be faithfully assembled from short reads and I think that using short reads is sacrificing precision of the assembly a bit too much - a personal opinion based on my knowledge of the assembly algorithms.
Thank you. I will try with velvet. Any luck on scaffolding and gapcloser .
I'd suggest unicycler, heard lots of good things about it.
Unicycler is a exclusive bacterial assembler. Mine is algal sample. I could try this for a bacterial sample. Thank you for suggestion. Are there any specific for algal
Have you tried SPAdes?
I have already used spades, abyss and idba_ud
If you already have a bunch of assemblies then you should think about consolidating them using GARM or equivalent.