I've written a software tool that allows genome scaffolds to be reliably reproduced by writing the set of instructions to build the scaffold as a domain specific language. The software, "Scaffolder," parses this instruction file, fetches the corresponding contig sequences, and joins them together into a continuous super-sequence. Separating the contig-joining process into a separate file decouples the data from the steps required to build the scaffold.
I'm writing on BioStar because I hope this software will be useful to the bioinformatics and genomics community. Therefore any patches, comments or constructive criticism of this software will improve and, ideally, make this a useful resource.
Finally, in addition, this software has been submitted to the journal Open Research Computation. Therefore any comments made on this question directly feed into the peer-review process for the article. I believe this could be an interesting approach to peer-review and will add to suggestions made by the two reviewers.
- Scaffolder website and documentation
- Preprint of the scaffolder manuscript
- Github repo for manuscript LaTeX source
- Github repo of scaffolder API
- Github repo of scaffolder CLI
Please separate suggestions into individual answers so they can be voted on individually. Multiple answers and votes are very welcome.
You should probably include a discussion of the pros/cons of your YAML file format vis-a-vis the standard AGP file format in the manuscript.
The name Scaffolder has already been used for scaffolding software in the original Celera WGS assembler written by Gene Myers: http://www.sciencemag.org/content/287/5461/2196.abstract :(
can I vote twice ? :-)
Vote as many times as you like? :) I feel in unexplored territory.
A different name would be useful then to distinguish the software. I spent a while originally trying to think of different names but Scaffolder was the best I could come up with.
Thanks for the suggestion on AGP. I'll look into this format in more detail. Is there a tool that converts AGP into the corresponding scaffold sequence?
How about "contigs2scaffolds" or "Scaffixer", in honor of the first patented scaffolding technology: http://www.scaffoldersforum.com/scaffolders-forum/2089-history-scaffolding.html. Other potential scaffolding related terminology can be found here.
Thanks Casey. Scaffolding related terms are an excellent idea. :)