5 minutes
(Read 12) The DevOps Handbook
Release year: 2016
Author: Gene Kim et al.
Review
I didn’t originally write a review for this book (these lines are written on 2024-02-11), but after reading it I was inspired to write a blog post titled Overcoming Learning Anxiety which you might find interesting. Long story short, completing this book was a very personal experience that had a profound impact on my career and my life. I can only give it the highest of ratings:
Félix rating:
👍👍
👍👍
⭐ Star quotes
- (p. xxxiv) The result of technical debt is that our most fragile artifacts support our most important revenue-generating systems and our most critical projects.
- (p. 14) The Three Ways of DevOps:
- Work in small batches
- Have fast feedback and monitoring
- Have a generative culture
- Actively seed and share information to enable the organization to achieve its mission
- Responsibilities are shared throughout the value stream
- Failure results in reflection, genuine inquiry
- (p. 22) Stop Starting. Start Finishing.
- (p. 27) Waste is activities that can be bypassed without affecting the result.
- (p. 49) The idea behind blameless postmortems:
- By removing blame, you remove fear.
- By removing fear, you enable honesty.
- Honesty enables prevention.
- (p. 49) In the absence of improvements, processes degrade over time due to chaos and entropy. Even more important than daily work is the improvement of daily work.
- (p. 71) ✅ A system of record is almost always the most difficult part to change in software. It is usually the part that limits our ability to make changes in interdependent systems.
- (p. 107) In the team, we want to:
- encourage learning
- help overcome learning anxiety
- ensure people have relevant skills
- ensure people have a defined career roadmap
- (p. 172) A recommended definition of “done”: Integrated, tested, working, shippable code, demonstrated in prod-like environment, created from trunk using a one-click process and validated with automated tests.
- (p. 213) The strangler fig application pattern is used to rewrite
critical systems and make the transition easier:
- Place existing functionality behind an API, where it remains unchanged
- Implement new functionality using desired architecture, making calls to the old system when necessary.
- (p. 232) The five universal logging levels:
- DEBUG: Anything that happens in the program. Often disabled in production, but enabled for troubleshooting.
- INFO: Actions that are user-driven or system specific (e.g. “beginning credit card transaction”)
- WARN: A condition that could potentially become an error (e.g. database call taking longer than threshold)
- ERROR: Error conditions (e.g. API call failures)
- FATAL: When we must terminate (e.g. program crash)
- (p. 232) ✅ When deciding whether a message is ERROR or WARN, imagine getting a call for it at 4 AM. For example, “low printer toner” is not an error.
- (p. 239) We need metrics at these levels:
- Business level: nb sales, revenue of sales, AB testing results, etc.
- Application level: transaction times, user response times, etc.
- Infrastructure level (db, OS, network, storage): CPU, disk usage, etc.
- Client software level: app errors or crashes, etc.
- Deployment pipeline level: build pipeline status, change deployment lead time, deployment frequencies, test env promotions, env status, etc.
- (p. 248) After an outage,
- Ask what indicators would have predicted the outage
- Add these indicators to the monitoring system, alerting as needed
- Repeat. We want alerts that prevent outages instead of alerts after a failure.
- (p. 260) The secret to smooth, continuous flow is small, frequent changes that anyone can inspect and easily understand.
- (p. 278) How to frame a hypothesis: “We believe (a change) will
result in (a good thing). We will have confidence to proceed when
(a measurement indicating link between change and good thing).”
Examples:
- A change: Increasing the size of images on booking page
- A good thing: Improved customer engagement and conversion
- A measurement: A 5% increase of customers who view images who book within 2 days
- (p. 285) ✅ The poorness of outcomes is proportional to the distance between the person doing the work and the person deciding to do the work.
- (p. 295) Essential elements in a pull request:
- Detail why the change is made
- Detail how the change is made
- Identify risks
- Identify countermeasures
- Mention someone appropriate to review the change
- (p. 308) Organizational learning requires engineers who made an error to be enthusiastic in helping the others avoid the same error in the future.
- (p. 309) How to conduct a blameless postmortem
- (p. 310) ✅ ⭐ Instead of feeling bad about a mistake, ask: “Why did it make sense to me when I took that action?”
- (p. 317) A service is not really tested until we break it in production.
- (p. 320) ⭐ ⭐ The only sustainable competitive advantage is an organization’s ability to learn faster than the competition.
- (p. 339) The most valuable thing any associate can do is mentor or learn from other associates. We are all lifelong learners who learn best from our peers.
- (p. 380) A key goal of DevOps practices is to streamline our “normal change” process so that it is also suitable for emergency changes.
- (p. 416) Human error is the effect of systemic vulnerabilities deeper inside the organization. Don’t blame the human; thank the human for highlighting your vulnerabilities.
comments powered by Disqus