Hi everyone!
I'm new to genomics and bioinformatics and was hoping I could get insight from seasoned bioinformaticians for some advice.
My lab has sequenced the genomes of several dozen strains of Bacillus subtilis and we've gotten the whole genome short reads (paired end, 150bp, llumina hiseq) back for several dozen strains. We now need to assemble their genomes. After that we want to do a comparative genomic analysis of them: compare gene content, function, differential regimes of positive selection on genes, shared/unique genes, known/novel genes responsible for ecological adaptations in nature, etc
Our lab has done comparative studies on these strains before based on single or triple housekeeping phylogenetic analyses, but this is our first time getting our hands on their whole genome sequences - and I don't have experience doing genome assembly at all.
There's plenty of B. subtilis reference strains available so I'm guessing a reference guided assembly would be our safest bet (I'm not sure why we would want to do de novo assembly if we have references available). I'm not familiar with the software or reference guided assembly pipelines out there. Do you guys have any suggestions for software or pipeline/approaches we should use to assemble our genomes?
Excited to join the field, thanks!
Can spaDES also do the mapping/reference assembly? I noticed that some de novo assemblers seem to be able to do reference-based assembly as well, but was wondering if it would be better to find a reference-guided tool specially for that step. Thanks for your advice!
Mapping and assembly are very different things. Usually the best programs are those dedicated to a single task. I recommend to use 'bwa mem' to map spades contigs to a reference genome, see here.
I use this often for contigs of bacterial genomes and it works quite well, despite that
bwa mem
is intended for aligning short reads.