I do some work for a small diagnostics company that has a requirement for small indel detection in 454 data. Those of you familiar with Roche's pipeline will be aware that AVA (Amplicon Variant Analyzer) blissfully ignores indels <3bp. The 454 is also subject to issues with homopolymer runs.
I'd like to try some alternatives to AVA that focus more on indel than SNP discovery (although SNP discovery is still useful).
Does anyone have any experience of these packages for indel calling with 454 data? Or any additional suggestions? We have looked at certain commercial packages, but they tend to come up slightly short on features, largely ones of scriptability/automation.
I'm the author of a variant detector, FreeBayes, which detects both SNPs and short insertions and deletions using BAM format alignment files. I've posted a note about this in another thread on indel detection.
In short, I strongly recommend you don't use GigaBayes, and instead use FreeBayes, which is a major improvement over GigaBayes in terms of interface, performance, and algorithm. (I need to update our documentation to this effect.)
FreeBayes can handle any insertion or deletion short enough to be spanned by a single read and represented in a single alignment record. If you want to detect of long insertions and deletions using 454 reads, you should also look into using Mosaik for your alignment step, as it can be configured to allow very long gaps alignments, although there is obviously a computational penalty for doing so.
The insertion and deletion support of FreeBayes is still under development. I'm currently working to resolve some confusion about reporting them in the VCF as well as some algorithmic considerations.
To get the ball rolling I actually started with the Variant Identification Pipeline. Whilst seemingly a good match from the paper it suffers from a number of issues.
Firstly the source code does not work out of the box from download, and I had to make code-level changes to remove hard-coded paths, the configuration file and it's subsequent use by the pipeline is very sensitive to missing/trailing slashes on paths, it relies on BioPerl modules deprecated in the 1.6.0 release (Bio::Tools::BPlite in this case) and uses sequence names as primary keys in the back end database in one case, meaning that you cannot re-run the pipeline on data that you have run through once, as it complains about primary keys already in use.
So I'm hoping for a tool a little bit more robust than this.
We are also looking for software capable to detect not only SNPs but indels for diagnostic applications, on a 8 sample test run (BRCA1 and BRCA2) both VIP and AVA missed a single nucl insertion. As a next step we will look into 'segemehl' aligner (just one thing is inconvenient with it, it doesn't output SAM format)
The company in question, I should point out, eventually went for a commercial solution from BioGene which is working very well in their hands, but not, of course, open source or cheap ;)
VAAL is potentially better as it is designed for 454 reads.