Operational Tools and Habits
Sunday, November 5th, 2006This is a (partial) list of operational tools and habits that I have found extremely useful in the past:
- One Click deployment: Each component of the system must be deployable in a fully automated fashion. There must be NO manual steps, a checklist is not good enough. If you can transparently audit it all the better.
- One Click rollback: Your rollback in case of a failure must be fully automated. Again there must be NO manual steps.
- The one-click should not require explicit access to the deployment servers.
- If multiple components are to be deployed then this should be done by a script.
- NEVER give developers write access to the production servers, an accident WILL happen if you do. Read access is fine and as you have one-click, audited, remote deployment they don’t really need it do they?
- Everything builds from a tagged source in a source management system or is a third party library of a known version (This may seem obvious but is often not the case).
- The separate components in a distributed system must be loosely coupled enough that they can be deployed independently. This is not always possible but it is definitely worth the effort. You want to avoid having to rollback multiple systems supported by different groups because of a single failure.
- If you have to deploy a non-backwardly compatible change to the communication protocol (it happens) then do a protocol release on its own.
The one that generally causes the most fuss is 4; but if your system requires regular access by developers you have a problem.