In the data exploration phase of several bioinformatics projects I have worked on, we looked at the data from many different angles, deciding on appropriate approaches for further analysis. During this phase, the questions were often, "What happens if we vary this parameter, and this parameter as well, and all combinations of the two?", and later, "What happens if we also look at this other parameter?" for something I didn't even think about varying at the beginning. And so on.
As this process continues, it can take increasingly more time to rework the code to carry out the new analyses. It often feels like a delicate balancing act between maintaining flexibility so you don't code yourself into a corner, and not spending too much time enabling flexibility you are not sure you will need.
What are the best ways to handle this problem?
Do you keep modifying and refactoring a small collection of ever-more-complex programs?
Do you pipeline a larger number of limited function scripts?
Are object oriented approaches more appropriate for this type of analysis, or do they require too much design overhead?
Does adherence to a specific development technique (ex. Agile) work best?
Are these techniques feasible for a small academic lab with typically one to two people working on a project?
Are there easier ways?
Ryan, great answer. The config file + functional + pipeline method is exactly my approach as well. Do you have any public code that uses ruffus? It would be great to see some real life examples where it improves flexibility of the code.
Thanks for the answer! Ruffus looks interesting, and has a good tutorial. I would be very interested in seeing a sample of this set up as well, if you are willing to post one. Thanks again.
I edited my answer to include a link to some example code.
...and keep everything in some kind of version control system, with proper notes to keep track of what you did.