Question

what should I use for reference guided assembly of bacterial genomes intended for comparative genomic analyses?

0

Entering edit mode

8.8 years ago

jerrybug109 ▴ 20

Hi everyone!

I'm new to genomics and bioinformatics and was hoping I could get insight from seasoned bioinformaticians for some advice.

My lab has sequenced the genomes of several dozen strains of Bacillus subtilis and we've gotten the whole genome short reads (paired end, 150bp, llumina hiseq) back for several dozen strains. We now need to assemble their genomes. After that we want to do a comparative genomic analysis of them: compare gene content, function, differential regimes of positive selection on genes, shared/unique genes, known/novel genes responsible for ecological adaptations in nature, etc

Our lab has done comparative studies on these strains before based on single or triple housekeeping phylogenetic analyses, but this is our first time getting our hands on their whole genome sequences - and I don't have experience doing genome assembly at all.

There's plenty of B. subtilis reference strains available so I'm guessing a reference guided assembly would be our safest bet (I'm not sure why we would want to do de novo assembly if we have references available). I'm not familiar with the software or reference guided assembly pipelines out there. Do you guys have any suggestions for software or pipeline/approaches we should use to assemble our genomes?

Excited to join the field, thanks!

genomics software assembly • 7.6k views

ADD COMMENT • link updated 19 months ago by Ram 44k • written 8.8 years ago by jerrybug109 ▴ 20

Ram · Answer 1 · 2016-02-23

1

Entering edit mode

8.8 years ago

piet ★ 1.9k

Assemble every genome denovo with spades. Then map the contigs from all assemblies to an appropriate finished genome in one run. Except for repeats, spades will assemble most of the genome with high accuracy. Assembling a single B.sub genome will presumably take about 20 min on a desktop or notebook.

The genome of laboratory strain 168 (AL009126.3) is the model-organism for all Firmicutes and very well annotated, but I would not be surprised, if some field isolates of B.sub will map very poorly.

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 8.8 years ago by piet ★ 1.9k

0

Entering edit mode

Can spaDES also do the mapping/reference assembly? I noticed that some de novo assemblers seem to be able to do reference-based assembly as well, but was wondering if it would be better to find a reference-guided tool specially for that step. Thanks for your advice!

ADD REPLY • link 8.8 years ago by jerrybug109 ▴ 20

0

Entering edit mode

Mapping and assembly are very different things. Usually the best programs are those dedicated to a single task. I recommend to use 'bwa mem' to map spades contigs to a reference genome, see here.

I use this often for contigs of bacterial genomes and it works quite well, despite that bwa mem is intended for aligning short reads.

ADD REPLY • link updated 6.2 years ago by Ram 44k • written 8.7 years ago by piet ★ 1.9k

Ram · Answer 2 · 2016-03-05

1

Entering edit mode

8.7 years ago

indexofire ▴ 40

For reference based assembly, you can try ragout

ADD COMMENT • link updated 6.2 years ago by Ram 44k • written 8.7 years ago by indexofire ▴ 40