Is it possible to include BBMap in Bcbio-nextgen pipeline? Are there any examples and tutorials on that software besides those posted on their website? Thanks.
Is it possible to include BBMap in Bcbio-nextgen pipeline? Are there any examples and tutorials on that software besides those posted on their website? Thanks.
It's possible to include it, but it would need integration work to include it as a new aligner. Documentation on writing that code is here:
https://bcbio-nextgen.readthedocs.org/en/latest/contents/code.html#aligner
Out of curiousity, why do you prefer BBMap over other integrated aligners in bcbio like bwa mem?
I'm curious about BBMap as well. Brian seem to be quite a comprehensive resource on Biostars, so maybe he can comment. Things I can glean from the internet:
- BBMap uses a semi global alignment algorithm instead of local Smith-Waterman
- There's this poster where Brian benchmarks on synthetic data with varying mismatches and indel sizes, showing good performance for large gaps in alignment/reference coordinates.
- The SAM output is TopHat compatible - so maybe this is designed for RNAseq applications?
So, yes, BBMap is designed for RNA-seq and DNA-seq, and it outperforms all other aligners I've tested when dealing with long indels (or indels in general, but particularly long ones), accuracy-wise. It's also good at aligning very-highly-mutated sequences or very low-quality data. BWA-mem has an advantage in memory use and (usually) alignment speed, though. And while this aspect is not very important with human resequencing, BBMap has a huge advantage in index-building time over other aligners; this is actually quite important when analyzing large numbers of de-novo assemblies of different organisms with different assemblers and parameters.
For processing with sra you'll want to convert into standard fastq format using the SRA toolkit (http://www.ncbi.nlm.nih.gov/books/NBK158900/). In bcbio, you can add a custom reference genome for tuberculosis (https://bcbio-nextgen.readthedocs.org/en/latest/contents/configuration.html#adding-custom-genomes) and then run a standard variant calling pipeline by creating a configuration file (https://bcbio-nextgen.readthedocs.org/en/latest/contents/configuration.html#automated-sample-configuration). I haven't personally called on tuberculosis so don't have any species specific tips but hope this helps for getting started.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I'm currently in the process of writing guides for most of the tools in the BBMap package... I've finished several, and BBMap is on my list for this week.