The Phoenix Project
A summary and a review of the Phoenix Project
by Danver Braganza on 2020-11-29
Our engineering team recently read the The Phoenix Project for the first book of our book club. The book came highly recommended by multiple senior members of our team, and several things about it intrigued me. Perhaps the most unconventional thing I knew about the book going in, was that unlike most other offerings in the Business/Engineering Self-Learning genre, this was a fiction novel.
Judging a book by its cover
The Phoenix Project is a mix between a parable and a cautionary tale, as it weaves a story of an IT1 manager, Bill, newly promoted to lead the entire dysfunctional engineering department of Parts Unlimited, an embattled auto manafacturing company. The success of the entire company hinges on the eponymous Phoenix Project, which has been promised to modernize their e-commerce presence and increase their revenue. However, when the book opens, the Phoenix Project is already overdue by several months. Engineering is held responsible, and patience from the CEO and the board is wearing out. Bill’s promotion looks more and more like a sentence.
A note before you read further: because this is a fiction book, some of the following contains spoilers. I don’t think they will detract from your enjoyment of the book, but if that’s a problem, skip down to How to get the most out of this book. And if you like, come back after reading it.
As you follow Bill and his team through their journey, you see how they act out the book’s main thesis, the application of the Three Ways. They work through multiple episodes and incidents, modelling in their lightly-fictionalized way how to implement process, respond to failures and continually increase communication and integration through the organization. Each little battle along the way serves both as a pattern of real-world behaviour that can be applied in practice, as well as an object lesson on a particular point of the philosophy of the Three Ways.
At times it can be hard to miss that the characters and incidents are fictional. The entire story occurs on a compressed timeline, like an episode of 24. The entire engineering department of Parts Unlimited seems comfortable with spending their nights and weekends at work to an unhealthy degree. Sweeping changes to policies and processes are rolled out to departments overnight, and the employees adapt to these new workflows and habits with effectively zero friction.
And finally, in what I found to be the most extraordinary suspension of real-world laws, the vice presidents and managers of these teams were almost always able to communicate their goals, incetives and metrics dispassionately, and come to rational conclusions within the space of a few sentences. Many an argument starts off with both parties butting heads, only to resolve with the right turn of phrase. It was like watching a debate between Spock and Data—the human element where people need to come to terms with an idea before being sold was underplayed intentionally.
This is not to be interpreted as a criticism of the Phoenix Project. Keeping a focus on high output while also maintaining a responsible work-life balance is complicated. Changing the habits of an organization requires committment and conscious application. And effectively convincing other people of your ideas is a skill of its own. Books have already been written, and will continue to be written, about each of these domains, and so the Phoenix Project is justified in treating them lightly. Once we accept that their novel has frictionless spherical cows, we can use the model to study the ballistics of the Three Ways.
The three ways
This book is an adaptation of Eliyahu Goldratt’s novel, The Goal, which follows a roughly similar plot, but focused around a manufacturing plant. Although I haven’t read The Goal, I gather that the three ways remain unchanged. The main thesis of the Phoenix Project is that the same concepts that are used in modern manufacturing to streamline the flow of work pieces through stations in a factory can be used to streamline the flow of software projects, through the different “stations” of the Software Development Lifecycle.
Conversely, the common problems that plague a dysfunctional engineering team can be analyzed via the same lens of modelling tickets, stories or features as work pieces and cataloging the steps they need to go through as the various stations of a plant floor.
The first way: Maximise the flow of work by focusing on constraints
On a plant floor, raw materials enter, and if everything goes right, finished products eventually leave. In between, each workpiece, or partly assembled product, has some set of operations it needs to undergo. These operations occur at set stations, such a painting shed or a curing oven. If your painting station is busy, then the work coming into that station needing a paint job will be delayed. Since many modern plants are set up to do different kinds of just-in-time manufacturing, they do not form a simple Assembly Line, but an assembly network.
Software projects, too, form an assembly network. Since we’re talking software, I hope we’re all comfortable if we choose to model the flow of work through our organization as a Directed Acyclic Graph, at least in principle. Projects, stories, features, bugs—pieces of work—start at some kind of reporting/detection/inception phase, then move through triage, estimation, design, development, verification and deployment. Each of these phases might encompass multiple software stations; for instance all changes to the network might need to have their design approved by a specific network security team before that project may advance.
In a flow network, the Max Flow/Min Cut Theorem says that the maximum flow you can achieve through the network is equal to the weight of the edges in the miminum cut, i.e. the bottleneck. Optimisations you make to the network anywhere else won’t improve the flow you receive at the output.
The book goes through some practical examples as Bill stumbles his way towards identifying these bottlenecks in his organization, which are called constraints. He then has to perform an exercise in humility and “subordinate to the constraint”, which means accepting that until he protects the flow of work through this constraint, the rest of his system is going to be plagued by work in progress piling up, or be idle waiting for input. Thus, when the initial decisions he takes to protect the constraint are almost immediately tested in the book, the success of his outcome depends on how committed his team is to their decisions.
One key takeaway that I’d like to continue to learn about was his identification of a particular individual, Brent, as a constraint. Brent is a knowledgeable individual contributor, who has over the years built up an encyclopaedic knowledge of the system. This knowledge then grows to be a curse, as the organization finds itself unable to do anything without Brent as a participant. I’ve heard and lived through many retellings of Person-As-Constraint in my life, and I’d love to hear about books that are specifically about this concept.
The second way: to minimize wasted flow, shorten feedback cycles
The flow of work idealized by the book starts from the source and progresses monotonically towards the sink, which is the customer. In this view, any time work is sent back to an earlier step it is an exception, and a waste of flow, which is a waste of productivity.
One example of wasted flow in engineering would be a critical bug caught during a smoke test just after a deploy. All of the steps between where that bug was introduced up to the deploy will need to be undone and are now wasted work. Such steps might include code review, unit tests, integration tests, manual testing in a pre-production environment, regulatory sign-off. This is just the cost of the work that was invalidated by this error, and does not include the cost of actually correcting the error.
Therefore, it becomes vitally important to detect such failures as quickly as possible by shortening the cycles of feedback between stations. This is such an important concept that I believe it has been independently discovered multiple times, and in software it gives us such practices as contract-driven development, test-driven development, continuous integration.
One point that sticks with me from this Way is that in addition to the widely-adopted best practices for increasing the speed of feedback above, there may be many unique opportunities that are driven by the specifics of your organization. Wherever there is lost work, there’s an opportunity for a shortened feedback cycle.
The third way: Continuously learn and experiment
The third way mandates that we continuously increase the capabilities of the system by taking risks and making changes. This naturally follows from the first two ways, since once you’ve eradicated a constraint in your system, the next slowest thing becomes your new constraint, and your shorter feedback cycles will help you find that faster.
At some point, you’re bound to hit diminishing returns, and the cost or the risk of making the next mandated change might be less than the expected optimization you receive. If-it-aint-broke-itis sets in.
The third way counteracts this by recommending: you should get better at making changes! Repetition is the mother of mastery, and so if you build a culture where changes are cheap to make, both in terms of up-front cost and potential risk, then the upper bound of what you’re likely to achieve is much higher.
By changing quickly, you achieve a higher total-flow terminal velocity.
It’s interesting that this needs to be called out explicitly as a Way, even though every engineering organization in the world is already deeply awash in improvements and changes caused by technological shifts. The pace of improvements of software tools has never been higher, and we developers find ourselves on a rat-race of frameworks, libraries and approaches.
The lesson I take from this is that learning has a kind of domain dependence, and it’s worth repeating the lesson so that we are called to apply it in every aspect. It’s not enough to have git replace svn, you need to become the organization that passes work between nodes in your value flow graph better because of git’s branching model. It’s not enough to change what you’re doing, you need to constantly re-evaluate how you’re doing it as well.
The way to get the most out of this book is to read it with your team over a period of several weeks. The Phoenix Project is ideally suited for a weekly book club or group study. The characters in the book are easy to identify with, and the episodes they go through, while dramatized, resonate with the day-to-day experiences any engineering team. Because the book is fiction, it adds additional motivation to keep up with the readings, since people seem to spend more effort to avoid plot spoilers than falling behind in a textbook.
At Big Health, we read about 3 chapters every week, and then met up in a group of around 4-6 people for a loosely structured chat over Google Hangouts. Every week, we had a healthy amount to discuss, as we attempted to pattern-match our own experiences against that of the book. The follow-up discussions from those meetings spilled out into one-on-ones, retrospectives, tickets and a host of other process changes and experiments. These changes are still running their course, and so we haven’t yet seen their full magnitude. Indeed, as we become more adept to the Third Way, the promise is that this improvement becomes a habit, and that the virtuous cycle snowballs.
I’m looking forward to seeing that.
What I’d like to learn next
After reading this book, there are a few areas of investigation that it opened up for me, and that I’d like to learn more.
The first is to more deeply look into the aspects that this book intentionally disregarded. The book renewed my interested in the areas it glossed over, of communication, negotiation and organizational change. Books like Getting to Yes and Never Split the Difference deal with communication and negotiation. The Power of Habit is a book about I’ve previously read for advice on building personal habits, but it has a section about building the habits of an organization that is going to be worth a second visit for me.
I’d also like to learn next about ways to avoid key person risk, or bus factor. Specifically when it comes to working with software, the tension between expertise and dilligence on the one hand and the need for knowledge sharing and avoiding individuals as a constraint deserves further exploration.
One important follow-up question I have for this book comes from its focus on traditional companies with a software/technology function. In that world, technology projects originate from “the Business”, and then flow from left to right towards the customer. However, in the software firms that I have worked in, that model would introduce a very long and ineffective feedback loop.
The most vital companies I’ve worked for have used observability and experimentation to continuously validate different hypotheses of the system. At one point, one of these companies was running over a hundred A/B tests, trialing new looks and behaviour on different segments of our customers. While this book merges Development and Operations to form DevOps, I would include Research into this group as well, to form a kind of R-n-DevOps.
- It is a telling choice of words that identifies the target audience of the book, that engineering function of this organization is referred to as IT, and is responsible for both internal corporate IT tasks such as printer security updates, furnishing personnel with laptops and intranet websites, as well as external engineering development and operations. In the rest of this article, I’m going to refer to it using the term from my own professional culture of the Bay Area, which refers to it as engineering. ↩︎
Other articles you may like
- Estimates, Design and the Payoff Line How inaccuracies in estimation lead to a systematic undervaluing of the importance of Design
- Names to avoid in Software Engineering An incomplete list of poor names for libraries, modules, projects and teams
- Distributed Circuit Breakers at Hipmunk How we implemented Circuit Breakers at Hipmunk to automatically deal with third party outages.