Flow, Not Heroics: What the Theory of Constraints Teaches IT Teams

Most IT work doesn’t move in straight lines.

It doesn’t look like a clean software pipeline or a factory stamping out identical parts. It looks like a job shop: a mix of projects, interruptions, dependencies, and shared resources, where every piece of work follows a slightly different path but relies on the same limited set of people and systems. That complexity is exactly why so many IT teams feel perpetually busy—and yet chronically behind.

The Theory of Constraints (TOC) exists to explain why.

These concepts aren’t new to IT or business in general. The Theory of Constraints was first introduced by Eliyahu M. Goldratt in The Goal, and later reframed in an IT context in The Phoenix Project and The Unicorn Project by Gene Kim and his co-authors. If this post resonates, I strongly recommend reading one (or all) of them.

The Core Idea: Systems Are Limited by Their Slowest Step

In The Goal, Goldratt introduces a deceptively simple idea:

Every system has at least one constraint, and the output of the entire system is governed by that constraint—nothing else.

You can optimize every other part of the system, but if you don’t address the constraint, total throughput does not improve.

Goldratt puts it bluntly:

“An hour lost at a bottleneck is an hour lost for the entire system.”

That sentence is the foundation of everything that follows.

A Simple Analogy: Making a Latte

Every morning, about ten minutes before I start work, I make a latte. I use a Breville Barista Express—good enough to be consistent, manual enough to require attention. Over time, I’ve dialed in the process so that the repeatable steps fade into the background and I can focus on what matters.

The workflow is straightforward, but recently I realized how cleanly it maps to flow and constraints.

From the outside—from the perspective of someone standing in line at a coffee shop—the process looks like this:

Order a drink
Wait
Receive the drink

That middle step—wait—is the only part anyone really cares about.

This is exactly how many people outside of IT experience IT work:

Request something
Wait
Receive the result

The natural reaction is to push harder on that waiting phase. Apply pressure. Ask for urgency. Start more things in parallel.

But that instinct is usually what makes the wait longer.

Why IT Feels Chaotic

IT work is job-shop work.

It doesn’t follow a single linear process. Depending on the problem, an IT project may involve completely different sequences of steps:

A server replacement touches procurement, staging, scheduling, cutover, and validation.
A data migration involves extraction, transfer, validation, retries, and cleanup.
A network refresh depends on access, vendor timelines, maintenance windows, and rollback planning.

Each job is unique—but they all compete for the same constrained resources:

Senior engineers
Maintenance windows
Vendor availability
Change approvals
Human attention

This is where traditional “stay busy” thinking starts to break down.

Just like a barista has to know how to prepare lattes, Americanos, and flat whites—each with different techniques and timing—IT engineers carry multiple distinct workflows in their heads. Each workflow contains steps that move quickly and steps that move slowly.

The slowest step in any given workflow is the constraint. It determines how fast the entire job can complete.

Meet Brent: The Constraint Made Human

In The Phoenix Project, the constraint isn’t a machine. It’s a person.

Brent is the engineer everyone depends on. He understands the systems no one else fully does. He gets pulled into outages, projects, escalations, meetings, and “quick questions” all day long.

What makes Brent critical isn’t that he works harder than everyone else—it’s that the system routes critical work through him.

Gene Kim summarizes the situation simply:

“If Brent is busy, the entire system slows down.”

The organization’s response is almost always predictable—and wrong:

Start more projects
Add more meetings
Apply more urgency
Rely on heroics

Each new task assigned to Brent just lengthens the queue in front of the bottleneck.

In every IT organization I’ve been part of, there has always been a Brent. Sometimes it’s a single individual. Sometimes it’s a rotating role. In larger environments, it can be an entire sub-team.

Once a backlog forms around that constraint, priorities stop being strategic and start being reactive. Work shifts to “who’s yelling the loudest” or “what fire just appeared,” and burnout quietly becomes part of the operating model.

Why Waiting Is Often the Longest Part of IT Work

This is the part where the Theory of Constraints becomes painfully concrete.

Take a data migration—something most systems engineers have done dozens of times.

From the outside, it looks simple:

Kick off the copy
Wait
Validate

From the inside, the reality is different:

The data transfer itself may take hours or days.
During that time, nothing downstream can complete.
Validation can’t finish.
Cutover can’t be scheduled.
Dependent work sits idle.

The constraint isn’t effort.
The constraint is elapsed time at a specific step.

Starting a second migration while the first one is still copying doesn’t make anything finish sooner. It just creates another item waiting for the same limited bandwidth, storage, validation window, or human oversight.

Goldratt calls this inventory.

In IT, inventory is unfinished work.

Why More Parallel Work Slows Everything Down

From the outside, it feels obvious: if someone has a free hour, give them another project. If there’s a lull while something is copying or waiting on a vendor, start the next task. Idle time looks inefficient. Movement looks productive.

The problem is that movement and progress are not the same thing.

In a job shop environment, starting more work doesn’t increase throughput. It increases fragmentation. Every additional project splits attention, introduces context switching, and creates another queue waiting on the same constrained resource. The bottleneck doesn’t move any faster—it just gets a longer line.

This is the part that feels counterintuitive. If everyone is busy, shouldn’t output increase?

What actually happens is the opposite. Lead times stretch. Work sits partially complete. Priorities blur. The real constraint becomes harder to see because everything looks equally urgent. And the moment something unexpected happens—an outage, a security incident, a leadership request—there’s no slack in the system to absorb it.

In The Phoenix Project, this is described as the illusion of progress. Improving parts of the system that aren’t the constraint might make people feel productive, but it doesn’t improve overall flow. It just creates more inventory waiting to be processed at the slowest step.

And in IT, uninterrupted focus is often the rarest resource of all.

If an engineer is juggling fifteen parallel initiatives, spending hours in status meetings, answering “quick questions,” and updating tracking systems, there is no sustained block of time left to actually push complex work across the finish line. Each project advances in inches instead of miles. Everything appears active, but nothing truly completes.

Parallel work feels efficient because it maximizes visible activity. In reality, it maximizes wait time.

This is where discipline becomes necessary.

When organizations push for maximum utilization, they unintentionally trade predictability for optics. Everything looks busy. Dashboards look full. People look engaged.

But finishing slows down.

Engineers feel constantly behind. Managers feel constantly surprised. The business experiences IT as inconsistent, even when everyone is working at full speed.

That gap—between visible effort and actual throughput—is what the Theory of Constraints is designed to close.

Utilization and Queue Wait Time

Queueing theory shows why high utilization creates this problem.

In The Phoenix Project, the team learns:

“The wait time is the percentage of time busy divided by the percentage of time idle.”

— paraphrased from the narrative

That means:

At 50% utilization, expected wait time ≈ 1 unit
At 90% utilization, expected wait time ≈ 9 units

Work sits nine times longer in queue simply because the resource is busier.

If your engineers are already 90% utilized—with meetings, reporting, and constant interruptions—when do they handle emergencies? When do they absorb sudden priority shifts? What work gets dropped?

Multiply this across multiple handoffs, and something with 30 minutes of actual hands-on effort can take days or weeks to finish.

As utilization approaches 100%, wait time explodes.

And this is often the moment organizations start relying on heroics—late nights, escalations, “all hands” pushes—to compensate for a system that left no room for variability.

Idle time isn’t waste.

It’s the buffer that keeps systems responsive.

It’s what prevents the need for heroics in the first place.

Limiting Work in Progress Is About Protecting Flow

Limiting work in progress isn’t about slowing teams down. It’s about aligning the system to its constraints.

When you reduce how much work is in flight at once, ticket queues shrink. Bottlenecks become visible instead of buried. Lead times stabilize. Priorities become real instead of theoretical.

Most importantly, the constraint stops drowning.

In The Unicorn Project, this is framed as protecting flow time—the total elapsed time it takes for work to move from start to finish, including waiting.

In the real world, companies like Starbucks succeed not because they are fast at any single step, but because they are consistent. A latte tastes the same and takes roughly the same amount of time at any location.

In IT, consistency builds trust. Predictable delivery, respected maintenance windows, and professional handling of exceptions do more for credibility than any heroic save ever will.

And this is where utilization quietly enters the picture.

Why Heroics Make Systems Worse

IT cultures often reward the wrong behaviors:

The person who works nights
The one who “saves” failing projects
The engineer juggling ten efforts at once

The Theory of Constraints exposes the cost of that mindset.

Heroics:

Mask constraints by compensating for them with overtime
Encourage overcommitment and unrealistic promises
Increase fragility through rushed or incomplete solutions
Degrade quality by bypassing validation and consistency

When Brent is always saving the day, leadership never sees the bottleneck—because it’s being hidden by burnout.

The Five Focusing Steps

So what can we actionably do to solve these issues?

The Theory of Constraints offers a loop, intended as a constant feedback cycle, instead of a one-time checklist:

Identify the constraint
Where does work wait the longest? Who is consistently overloaded?
Exploit the constraint
Remove distractions. Front-load prep. Eliminate unnecessary work.
Subordinate everything else
Stop starting work the constraint can’t absorb.
Elevate the constraint
Add capacity, automation, documentation—or reduce demand.
Repeat
A new constraint will emerge. That means you improved.

This is systems thinking, not micromanagement.
It’s also something that can—and should—be revisited in every post-project review.

What This Means in Practice

In real IT work—data migrations, server replacements, network cutovers—the value-added effort is often small compared to the waiting between steps.

Waiting for:

Data copies to complete
Validation to happen
Engineers to become available
Change windows to open

…is where the time actually goes.

Reducing utilization and limiting work in progress shortens those waits far more effectively than trying to make individuals “work faster.”

That’s the insight embedded in The Phoenix Project and The Unicorn Project: not as abstract theory, but as the explanation for why “simple” work takes so long.

Flow beats heroics.

Finish beats busy.

And constraints, once visible, stop being enemies and start becoming guides for how we actually run our teams.