Author: Gene Kim, Jez Humble, Patrick Debois, John Willis

Release year: 2016

Publisher: IT Revolution

My review

The DevOps Handbook has been a transformative force in my life. I’ve read it three times now, each reading marking a different phase of my personal and professional growth. The first attempt, as a Contxtful employee, was a complete failure—I couldn’t get past page 50 and learned nothing. The second time, at Nesto, I had developed a better learning system and finally finished it. That review still exists, and it’s fascinating to compare how my learning system has evolved since then.

That second reading was life-changing. It introduced me to the concept of learning anxiety—two words I’d seen separately but never connected. The concept perfectly described my reality and pushed me to explore my limits in ways I never had before. It was so transformative that I ended up writing an entire book about it.

Now, on my third reading, I’m a different person. I’ve added 113 reviews to this website, discovered countless new concepts and star quotes, and matured significantly. A quote from The 7 Habits of Highly Effective People keeps resonating with me:

“We must not cease from exploration. And at the end of all our exploring will be to arrive where we began and to know the place for the first time.”

–T.S. Eliot

I keep circling back to The DevOps Handbook. Despite believing it’s a groundbreaking, life-changing book, I want this to be my final review. I’ve extracted every ounce of value from it, and I’m putting everything I have into this review because I want to be done with it. I want to move on, to truly put down the book and say “I get it.” So here we go.

What is the book about as a whole?

This book addresses a fundamental organizational challenge: teams that must collaborate often have diametrically opposed objectives that make collaboration nearly impossible, despite the best intentions on both sides. The core tension is between Dev(elopers) and Op(erator)s—they push in opposite directions. Developers want fast change to test new ideas, while Operators want to limit the rate of change to provide stability and security.

This relationship has existed throughout history, long before software was invented. The book goes deeper than software—it’s about systematically aligning objectives between conflicting teams.

What is being said in detail, and how?

The book presents a framework called The Three Ways—three fundamental steps organizations can take to bridge the gap between Dev and Ops teams and align their objectives and methods.

The First Way focuses on streamlining the hand-off between Dev and Ops to minimize friction. In modern software development, this means moving beyond the old model where Dev throws code to Ops with deployment instructions and lets them figure it out. Instead, both teams collaborate to create an automated pipeline that transfers software from Dev to Ops seamlessly. This creates automation, reliability, and eliminates the need for Ops to interpret Dev’s deployment instructions. The pipeline becomes the translation layer between teams, ensuring that future deployment outages won’t stem from misunderstandings between Dev and Ops.

The Second Way involves monitoring the outcome of every deployment to inform Dev’s decisions before the next deployment. It’s an informational pipeline that instruments deployments. While often technical and misunderstood, I think of it this way: if the First Way is swinging a bat at a baseball game, the Second Way means doing it with your eyes open to see where the ball will land, so you can better plan which base to reach.

The Third Way is the most mysterious yet essential element. It’s about creating a human culture that digests all the outcomes and data points you’ve gathered. Having a dashboard is useless if no one interprets the data over time. If the same incidents keep repeating, you’re not learning. This step confuses people because it seems paradoxical—when you’ve reached “peak system competency,” it feels counterintuitive to step back, sit in a circle, and discuss what went well, what went wrong, and what to try next. The real work lies in finding ways to eliminate blame and instead focus on what you’re ready to try next to make things slightly better. By developing the habit of such discussions, improvement becomes inevitable.

The book provides real-world examples of corporations that learned these lessons the hard way, often describing the remarkable positive results that followed when they embraced DevOps principles.

What did I learn?

I learned three key things from this third reading:

How to quote citations properly—a surprisingly important skill I’d been missing
The absolute necessity of writing reviews—I didn’t do this last time, and I wish I did to see the impact of the improvements in my learning system
Validation of my learning method—my learning without anxiety approach helped me understand this book better than ever before, including my “successful” second reading

Here’s the thing: I didn’t expect to learn many new DevOps concepts on this third reading, especially since I finally understood the Three Ways during my second attempt. By 2023, I was already seeing DevOps everywhere—even in how my partner and I handled household chores. DevOps had become more than a concept; it was a way of life.

However, between that second reading and now, I’ve become disheartened to see that most of the world doesn’t interpret DevOps the same way I do. I’m not claiming to hold the truth, but I fear that many organizations look at the titans—Amazon, Google, Facebook—see that “DevOps” made them successful, and think “I want that too.” So they hire “DevOps engineers” and create “DevOps teams” to solve their velocity issues.

It doesn’t work like that. I think more people should read Sooner Safer Happier for a reality check.

The most surprising thing I learned this time was how to properly attribute quotes. After all this time, I couldn’t believe it. Comparing my current notes with my 2022 notes, I realized I often attributed quotes to the authors that they themselves were quoting from elsewhere. When I wanted to use these quotes in Overcoming Learning Anxiety, I discovered that about a third of them were misattributed. It’s extremely difficult to correct attribution later, so doing it right the first time is crucial. The authors of The DevOps Handbook did an excellent job with their citations, a detail I hadn’t noticed before. I was able to use their notes to correct many errors in my book.

There’s also a passage that seemed abstract during my first reading but now resonates deeply with my experience at a previous employer. It describes the painful reality of watching a slow-motion trainwreck:

p. 57 The Downward Spiral in Three Acts:

We must keep the app running to deliver value. Many of our problems are due to apps and infra that are complex, poorly documented, and fragile. Our most fragile artifacts support our most important revenue-generating systems and critical projects. When changes fail, they jeopardize our most important organizational promises.

Somebody has to compensate for the latest broken promise. Oblivious to what technology can or can’t do, or what factors led to missing our earlier commitment, they commit the tech organization to deliver upon this new promise. Thus, devs are tasked with another urgent project that inevitable requires solving new challenges and cutting corners to meet the promised deadline, further adding to tech debt. “We’ll fix when we have more time.”

Everything starts taking longer. Work requires more coordination. Our product delivery cycles continue to move slower and slower. Fewer projects are undertaken, and those that are, are less ambitious. We are no longer able to respond quickly to our changing competitive landscape, nor provide stable, reliable service to our customers.

Result: We lost in the marketplace.

Another passage I found fascinating, which I didn’t emphasize enough in my first review, covers improving constraints:

p. 135 The five focusing steps to improve a constraint:

Identify the system’s constraint

Decide how to exploit the system’s constraint

Subordinate everything else to the above decisions

Elevate the system’s constraint

If a constraint has been broken in the previous steps, go back to step one but do not allow inertia to cause a system constraint

Other highlights worth noting:

p. 159 The six types of feedback in software development:

Dev tests (to confirm I wrote the code I intended)

Continuous integration and testing (we respected the existing expectations of the code)

Exploratory testing (have we introduced any unintended consequences?)

Stakeholder feedback (as a team, are we headed in the right direction?)

User feedback (do our customers love it?)

p. 235 [Getting something out (MVP) is better than getting something perfect.]

p. 236 [Finishing something is better than starting (limiting WIP and focusing on top priorities).]

p. 594 Tom Limoncelli, co-author of “The Practice of Cloud System Administration: Designing and Operating Large Distributed Systems” and former SRE at Google:

“When people ask me for recommendations on what to monitor, I joke that in an ideal world, we should delete all the alerts we currently have in our monitoring system. Then, after each user-visible outage, we’d ask what indicators would have predicted that outage and then add those to our monitoring system, alerting as needed. Repeat. Now we only have alerts that prevent outages, as opposed to being bombarded by alerts after an outage already occurred.”

How can I use this?

Much has changed for me since my first and second readings. I’m no longer a “DevOps engineer”—I’ve become a “Scrum Master.” If developing software were moving furniture, I don’t carry the couch; I ensure nothing gets in the way of the people moving it, and I make sure they’re heading in the right direction.

I won’t be professionally developing pipelines and dashboards (thank goodness!), but I will continue hosting blameless postmortems, retrospectives, and similar practices. I like to believe that my example is subtly teaching people what DevOps stands for.

I can use these ideas to maintain a positive mindset and understand that problems are rarely an individual’s fault. When issues arise, I can develop the habit of asking, “Is there anything I could have done to prevent this incident? What about the next one?”

These principles help me maintain a positive outlook, aiming for the stars by taking one step at a time. They keep me patient and focused on real issues that require more than days, weeks, or even months to solve.

Interestingly, I’ve even applied these ideas to writing my book. I created a pipeline for automatic web publishing, found ways to improve every flaw I discovered, and gathered feedback from readers. Essentially, I’ve incorporated DevOps principles into everything I do because they make my life so much easier.

Why must I use this?

Because it saves time and energy. Now when something doesn’t work, instead of blaming, feeling bad, or hiding in a corner, I assume everyone involved wants to succeed but can’t find the answer that will solve everything. So I try to gather everyone’s opinion with the goal of finding one thing we can try that should definitely improve our situation, even if just slightly. I do this as often as possible, and the results are dramatic.

It’s like investing—get-rich-quick schemes don’t work. It’s the constant, consistent investments that make a monumental difference over time.

When will I use this?

Every day. I might stop using the word “DevOps” because it pains me that some people don’t share my vision of what it represents. The word becomes useless if when I say “DevOps,” you only think of the First Way and nothing more. In fact, I was hurt when one of my managers told me, “Whatever you’re doing, please stop calling it DevOps, because it’s not DevOps.”

So I’ll continue what I’m doing, but I won’t call it DevOps. I’ll stay connected to the DevOps community and keep sharing these ideas. I’ll do everything I can to ensure that I, and the people I work with, continue making intelligent, interesting mistakes that are never the same as the ones we’ve made before.

Félix rating:
👍👍

📚 Vocabulary

value stream: (as defined by Karen Martin and Mike Osterling)
- the sequence of activities an organization undertakes to deliver upon a customer request
- the sequence of activities required to design, produce and deliver a good or service to a customer, including the dual flows of information and material

⭐ Star Quotes

(⚠️ Note: The page numbers here match the digital version read on my Kobo Libra. My digital version had 1096 pages. The print of the 2nd edition of this book contains 528 pages.)

Note from the publisher on the second edition

Foreword to the second edition

Foreword to the first edition

Preface - Aha!

(p. 41) Many DevOps practices emerge if we continue to manage our work beyond the goal of “potentially shippable code” at the end of each iteration, extending it to having our code always in a deployable state, with developers checking into trunk daily, and if we demonstrate our features in production-like environments.
(p. 44) “DevOps isn’t about automation, just as astronomy isn’t about telescopes.” —Christopher Little

Introduction: Imagine a world where Dev and Ops Become DevOps

(p. 61) As the saying goes, “It is virtually impossible to make any business decision that doesn’t result in at least one IT change.”

Part I: The Three Ways

Part I: Introduction

Chapter 1: Agile, Continuous Delivery, and the Three Ways

(p. 104) The flow metrics:
- Flow velocity (flow items completed per period, tells if delivery is accelerating)
- Flow efficiency (flow items actively worked on per time)
- Flow time (how long it takes for a flow item to cross the value stream)
- Flow load (number of active/waiting flow items in a value stream, aka WIP)
- Flow distribution (the proportion of each flow item type in a value stream)

Chapter 2: The First Way: The Principles of Flow

(p. 123) Work is not done when Development completes the implementation of a feature. Rather, it is only done when our application is running successfully in production, delivering value to the customer.
(p. 126) “Controlling queue size [WIP] is an extremely powerful management tool, as it is one of the few leading indicators of lead time.” –Dominica DeGrandis
(p. 126) ⭐ ✅ When we limit WIP, we find that we may have nothing to do because we are waiting on someone else. Although it may be tempting to start new work (i.e. “It’s better to be doing something than nothing”), a far better action would be to find out what is causing the delay and help fix that problem.
(p. 126) ⭐ Bad multitasking often occurs when people are assigned to multiple projects, resulting in prioritization problems.
(p. 126) ⭐ “Stop starting, start finishing.” –David J. Anderson, author of Kanban: Successful Evolutionary Change for Your Technology Business
(p. 135) ⭐ “In any value stream, there is always a direction of flow, and there is always one and only constraint; any improvement not made at that constraint is an illusion.” —Dr Eliyahu Goldratt
(p. 135) ⭐ Waste is the use of any material or resource beyond what the customer requires and is willing to pay for." –Shigeo Shingo
(p. 135) The five focusing steps to improve a constraint:
- Identify the system’s constraint
- Decide how to exploit the system’s constraint
- Subordinate everything else to the above decisions
- Elevate the system’s constraint
- If a constraint has been broken in the previous steps, go back to step one but do not allow inertia to cause a system constraint
(p. 139) “Waste and hardship in the software development stream is anything that causes delay for the customer, such as activities that can be bypassed without affecting the result.” —Mary and Tom Poppendieck
(p. 148) “‘No’ is okay, as long as it’s followed up with another idea to try. Because if I have a lousy idea, but it’s the only idea out there, than you know what? My lousy idea is the best idea we got going, and so that’s the one we try.” —Dr. Chris Strear, an emergency physician for > 19 years.
(p. 148) “People behave based on how they’re measured and how they’re rewarded.” —Dr. Chris Strear
(p. 149) “Who cares about flow through an individual unit?” —Dr. Chris Strear

Chapter 3: The Second Way: The Principles of Feedback

(p. 173) “It’s impossible for a developer to learn anything when someone yells at them for something they broke six months ago—that’s why we need to provide feedback to everyone as quickly as possible, in minutes, not months.” —Gary Gruver
(p. 174) Our most important customer is our next step downstream. Optimizing our work for them requires that we have empathy for their problems in order to better identify the design problems that prevent fast and smooth flow.

Chapter 4: The Third Way: The Principles of Continual Learning and Experimentation

(p. 180) “Responses to incidents and accidents that are seen as unjust can impede safety investigations, promote fear rather than mindfulness in people who do safety-critical work, make organizations more bureaucratic rather than more careful, and cultivate professional secrecy, evasion, and self-protection.” —Dr. Sidney Dekker
(p. 183) When accidents and failures occur, instead of looking for human error, […] look for how we can redesign the system to prevent the accident from happening again.
(p. 184) ⭐ “By removing blame, you remove fear; by removing fear, you enable honesty; and honesty enables prevention.” —Bethany Macri, engineer at Etsy who created the Morgue tool
(p. 185) “Even more important than daily work is the improvement of daily work.” —Mike Orzen, author of Lean IT
(p. 193) Greatness is not achieved by leaders making all the right decisions—instead, the leader’s role is to create the conditions so their team can discover greatness in their daily work.

Part I: Conclusion

Part II: Where to Start

Part II: Introduction

Chapter 5: Selecting Which Value Stream to Start With

(p. 234) “Culture eats strategy for breakfast.” —Peter Drucker
(p. 238) [Transform “When will this project be done?” into “When do we start seeing value?”]

Chapter 6: Understanding the Work in Our Value Stream, Making It Visible, and Expanding It Across the Organization

(p. 248) One of the most efficient ways to start improving any value stream is to conduct a workshop with all the major stakeholders and perform a value stream mapping exercise […] to help capture all the steps required to create value.
(p. 255) Our goal is not to document every step and associated minutiae, but to sufficiently understand the areas in our value stream that are jeopardizing our goals of fast flow, short lead times, and reliable customer outcomes.
(p. 260) Bureaucracies are incredibly resilient and are designed to survive adverse conditions—one can remove half the bureaucrats, and the process will still survive.
(p. 266) A problem common to any process improvement effort is how to properly prioritize it—after all, organizations that need it most are those that have the least amount of time to spend on improvement.

Chapter 7: How to Design Our Organization and Architecture With Conway’s Law In Mind

(p. 281) Conway’s Law states that organizations which design systems… are constrained to produce designs which are copies of the communication structures of these organizations… The larger an organization is, the less flexibility it has and the more pronounced the phenomenon.
(p. 301) When we value people merely for their existing skills or performance in their current role rather than for their ability to acquire and deploy new skills, we (offten inadvertently) reinforce what Dr. Carol Dweck descibes as the fixed mindset.
(p. 302) ⭐ We want to encourage learning, help people overcoming learning anxiety, help ensure that people have relevant skills and a defined career road map, and so forth. By doing this, we help foster a growth mindset in our engineers.

Chapter 8: How to Get Great Outcomes by Integrating Operations Into the Daily Work of Development

Part II: Conclusion

Part III: The First Way: The Technical Practices of Flow

Part III: Introduction

Chapter 9: Create the Foundations of Our Deployment Pipeline

Chapter 10: Enable Fast and Reliable Automated Testing

(p. 410) “Although testing can be automated, creating quality cannot. To have humans executing tests that should be automated is a waste of human potential.” —Elisabeth Hendrickson (“On the Care and Feeling of Feedback Cycles”)

Chapter 11: Enable and Practice Continuous Integration

Chapter 12: Automate and Enable Low-Risk Releases

(p. 504) ⭐ A DevOps team is a team that brings together Dev and Ops onto one team. It is not a team of DevOps engineers.

Chapter 13: Architect for Low-Risk Releases

Part III: Conclusion

Part IV: The Second Way: The Technical Practices of Feedback

Part IV: Introduction

Chapter 14: Create Telemetry to Enable Seeing and Solving Problems

(p. 579) When metrics aren’t actionable, they are likely vanity metrics that provide little useful information – these we want to store, but likely not display, let alone alert on.

Chapter 15: Analyze Telemetry to Better Anticipate Problems and Achieve Goals

Chapter 16: Enable Feedback so Development and Operations Can Safely Deploy Code

(p. 622) Because production deployments are one of the top causes of production issues, each deployment and change event is overlaid onto our metric graphs to ensure that everyone inthe value stream is aware of relevant activity, enabling better communication and coordination, as well as faster detection and recovery.

Chapter 17: Integrate Hypothesis-Drive Development and A/B Testing Into Our Daily Work

(p. 644) “The most inefficient way to test a business model or product idea is to build the complete product to see whether the predicted demand actually exists.” –Jez Humble
(p. 646) The period when experimentation has the highest value is during peek traffic seasons.
(p. 650) Based on research from Dr. Ronny Kohavi, Distinguished Engineer at Microsoft, two-thirds of features either have a negligible impact or actually make things worse. All these features were originally thought to be reasonable, good ideas. In summary, user testing > intuition and expert opinion.
(p. 659) ⭐ Success requires us to not only deploy and release software quickly but also to out-experiment our competition.

Chapter 18: Create Review and Coordination Processes to Increase the Quality of Our Current Work

(p. 668) ⭐ Building high-trust cultures is likely the largest management challenge of this decade.
(p. 670) As has been proben time and again, the further the distance between the person doing the work (i.e. the change implementer) and the person deciding to do the work (i.e. the change authorizer), the worse the outcome.
(p. 692) “In a pull request, there must be sufficient detail on why the change is being made, how the change was made as well as any identified risks and resulting countermeasures. If something bad or unexpected happens upon deployment, it is added to the pull request, with a link to the corresponding issue.” –Ryan Tomayko, CIO and co-founder of GitHub and one of the inventors of the pull request process.

Part IV: Conclusion

Part V: The Third Way: The Technical Practices of Continual Learning and Experimentation

Introduction

Chapter 19: Enable and Inject Learning Into Daily Work

(p. 713) “Human error is not our cause of troubles; instead, human error is a consequence of the design of the tools that we gave them.” –Dr. Sidney Dekker
(p. 719) “In that moment when we do something that causes the entire site to go down, we get this ‘ice-water down the spin’ feeling, and likely the first thought through our head is, ‘I suck and I have no idea what I’m doing.’ We need to stop ourselves from doing that, as it is the route to madness, despair, and feelings of being an imposter, which is something that we can’t let happen to good engineers. The better question to focus on is, ‘Why did it make sense to me when I took that action?’ " –Ian Malpass, engineer at Etsy
(p. 720) It is not acceptable to have a countermeasure to merely “be more careful” or “be less stupid” – instead, we must design real countermeasures to prevent these errors from happening again.

Chapter 20: Convert Local Discoveries Into Global Improvements

(p. 751) “The actual compliance of an organization is in direct proportion to the degree to which its policies are expressed as code.” — Justin Arbuckle

Chapter 21: Reserve Time to Create Organizational Learning and Improvement

(p. 786) ⭐ “The most valuable thing any associate can do is mentor or learn from other associates.” — Steve Farley

Part V: Conclusion

Part VI: The Technological Practices of Integrating Information Security, Change Management, and Compliance

Chapter VI: Introduction

Chapter 22: Information Security Is Everyone’s Job Every Day

(p. 845) Any CI/CD pipeline can be compromised to insert malicious payloads.

** Chapter 23: Protecting the Deployment Pipeline**

(p. 869) A key goal of DevOps practices is to streamline our normal change process such that it is also suitable for emergency changes.

Part VI: Conclusion

A Call to Action: Conclusion to the DevOps Handbook

(p. 904) Innovation is impossible without risk-taking, and if you haven’t managed to upset at least some people in management, you’re probably not trying hard enough.
(p. 904) Don’t let your organization’s immune system deter or distract you from your vision.
(p. 908) “High productivity is masking an exhausted workforce.” – 2021 Work Trend Index: Annual Report
(p. 914) “High performance starts with organizations whose leadership focuses on building an environment where people from different backgrounds and with different identities, experiences, and perspectives can feel psychologically safe working together, and where teams are given the necessary resources, capacity, and encouragement to experiment and learn together in a safe and systematic way.” — Jez Humble
(p. 915) “My definition of DevOps is ’everything you do to overcome the friction between silos.’ All the rest is plain engineering.” — Patrick Debois

(Read 126) The DevOps Handbook