When I send results out to my colleagues, I'm a bit worried that any subsequent work on a project may change those results - particularly when those results are being followed up by subsequent benchwork. "Hi, could you send me the top 10 hits from experiment XXX". Yeah no problem. Can I send them to you again when I've refactored this bit of my script, or when I've added this seemingly independent feature to my program?
At present I don't do any consistency checks on my results files; and was wondering what approaches are used _out there_?
Do you md5sum every results file and raise a note when those values change for example?
Do you have a continually-updated results silo on dropbox or similar, and let your colleagues pull results from there.
Do you diff & log before updating any results file?
I'm happy for results to change and feel that this will be an inevitable part of an evolving research project. What I'd like to be aware of is when things are changing when I don't expect them to (refactoring of my code; updating my packages / environment), and when results that I've previously sent to colleagues have changed due to altered requirements / bug fixes etc.
Right, but all of that (software versions and such) should be static to a given project.
I'm not sure I agree. Certainly the analysis code / environments / dependencies will be static once the project is mothballed; but during it's active development, these things will necessarily change.
In my experience software versions rarely if ever change during the course of a project. Otherwise it's a pain to keep track of which version produced which result.