Andrew M. Heller Professor at the Wharton School, Senior Fellow Leonard Davis Institute for Health Economics Co-Director, Mack Institute of Innovation Management The Wharton School
Resource is a positive probability of creating a defect that has to be captured
in the process flow diagram. In this session, we will see how this can
either take the form of scraping the flow unit, which means we just drop it from the
flow or reworking the flow unit. Which means repeating some of the
operations that has happened before the defect.
Either way, defects show up in the process flow diagram and have the potential to
make a resource bottleneck that, without these reworked loops will not be a
bottleneck. Moreover,, in addition to just changing
the flow of the flow unit to the process flow diagram, defects also increase the
variability in the process. Think about it. A defect probability of
ten%, at a resource doesn't mean that exactly every tenth unit is defective, but
it's a probabilistic statement that leads to variability in the flow.
And, we already know that variability in flow is a big enemy in the lean operation.
How do we deal with variability in flow? We're going to repeat our old tension
between buffer or suffer. And we'll, once again, notice that there's
no easy answer to this tension. Let's ignore the effects of variability
for a moment and turn our attention to the following process.
It's a three steps process with processing times being five minutes at the first
resource, four minutes at the second, and six minutes at the third.
Whereas the bottleneck assuming, though, that there is a defect probability of 50%
at the secondary source. Because of the 50% probability, the flow
unit following the work at step two has to be scrapped.
Well, in the past, we have done a [unknown] analysis.
We have looked at the capacity of the resources as simply one over five units
per minute, One over four units per minute, and one
over six. We multiply this with 60 minutes in an
hour, We get capacity of twelve units per hour,
fifteen units per hour, and ten units per hour.
So, would we really say that this step is a bottleneck?
We have no considered the fact of the scrap rate.
If you want to consider that the units that go through step two, only half of
them will actually make it to step three. How do we factor that into the analysis.
I want you to rethink about how we tackle process flow diagrams when we had multiple
types of flow units. In this case, we often look backwards and
ask ourselves, what is the demand for each of the resource that is, needs to be
served? Now, assume that we have a demand of D
flow units. How many units do we have to be serving at
station number three? We have to be serving D units.
Everyone that needs to get serve, one to one, has to go through station number
three. How does it work for station number two?
Well, I have to serve two times D in order to get D units of output for stations
three. Similarly, also at station one, I have to
produce 2D equals the half of the units will be scraped at station two.
If I now look at the demand relative to capacity, which we back then refer to as
the implied utilization, I actually can find the bottleneck.
2D divided by twelve, 2D divided by fifteen, 1D
Divided by ten. So, the implied utilization is, at the
maximum, it's the first step over here. So, we would call this step the
bottleneck. And you see how the scrap rate actually is
impacting where we have the bottleneck in our process.
Instead of scrapping a flow unit, I can also rework it.
Rework means I'm going to repeat one or several operations, and after the rework
is done, the flow unit that was previously defective becomes a good flow unit.
Consider the following example here. I have three process steps.
The first process step has a processing time of five minutes and never does a
defect. The second process step has a four minute
processing time, and with a 30% probability, makes a defect.
However, I'm going to catch that defect right at the completion of the step two.
And, if I'm going to repeat the operation, I'm going to spend another four minutes,
I'm going to fix the flow unit. Let's assume for now that a rework also
always works and fixes the unit in the first pass of the rework.
And then, finally, I have a third step here in the process with a processing time
of two minutes. Where's the bottleneck??
Well, a very naive analysis would suggest that we just take the processing times,.
Do one over the processing time to get the capacity levels.
So, that means one over five, units per minute,
One over four, and one over two. However, rather obviously, this analysis
misses the effect of rework. Rework can make an activity that
previously has been a non-bottleneck step, a bottleneck just because it sucks up
extra capacity for the rework group. So, a battery analysis goes something like
that. Instead of saying that my processing times
are five minutes, four minutes and two minutes,
I have to acknowledge that with a second step.
With a 70% probability, it would take me indeed four minutes.
That means, everything goes well. However, with a 30% probability, it's
going to take me four plus four minutes. So, the expected processing time is simply
0.7 four + 0.3 eight, which is equal to 5.2 minutes.
No impact on the processing times, it's the first and the last step, so five and
two minutes. And I see here that true capacity really,
the true capacities are one-fifth, 1/5.2, and one-half which reveals that really the
second step is the bottleneck. An alternative calculation to get to the
same idea is to use the calculations of implied utilization, which we had in the
example of processes with multiple flow units.
Just call the flow rate demand. Let's just assume that we're processing
here. at a certain unknown rate, And we're placing a certain demand rate.
Let's call this demand rate D. If we start with D,
We realize that the demand rate for the first step is simply D, for the last step
its going to be D. However, for the second step here, the demand rate is really 1.3D.
That is simply revealing that for 30% of the flow unit, I'm going to add another
round of processing at the second step, and so the flow has to be 1.3D.
If you read, then recall our definition of implied utilization as the ratio of demand
to capacity, you're going to get D divided by the capacity of 1/5,, which is 5d.
Then, 1.3d divided by the capacity of one-fourth, and finally D divided by
one-half which is 2D. So, you notice here that this is 5.2D,
which is a highest implied utilization, making the second step indeed the
bottleneck. Now, that we understand how defects impact
the bottleneck calculations, let us understand some of the economics of
defects. What are the costs of making a defect?
Consider the following example. In a restaurant, I'm sourcing pasta from
the market. I have somebody who brings the pasta from
the market and puts it into the kitchen, That's the preparation step.
I then have a busy and famous chef who is preparing the meal.
Finally, somebody puts a meal on a plate and brings it to the customer.
We're charging $twenty for each meal and so are enjoying quite a significant
markup. Now, what is the cost of making a defect
in this process? Is it driven by the $two per meal that it
costs us to source the pasta? Or is it driven by the $twenty that we can make by
selling the pasta? The answer to this question would depend
on where the defect happens. Or, more accurately, where the defect is
identified. If the defect happens here as we come back
from the market and we are dropping some pasta on the floor,
All we have to do is buy another round of pasta.
So as long as the defect happens before the bottleneck, it just costs us $two.
If, however, the chef has already spent all the time preparing the meal, and the
defect happens as we're serving the food to the customer. So, say the server is
dropping the pasta on the floor, we have to go back to our scarce resource and go
back to the bottleneck. Since a bottleneck is a constraint on the
process, we really have to charge a $twenty to this type of defect.
So, you'll notice how defects that are happening before the bottleneck are really
just driven by the input prices. Defects that are happening after the
bottleneck, however, need to be valued at the cost, or more strictly speaking, the
opportunity cost of a unit of. Notice in this example, that the crucial
question is not where the defect occurred but where the defect was identified.
If we buy bad pasta and we have the bottleneck spend time on preparing the
meal, We catch the defect only after the
bottleneck. It didn't matter that the defect occurred
before the bottleneck, But it mattered that the bottleneck spent
time on it. So, this drives the location of test
points in the process. It's important that we test flow units as
much as we can before we put them into the bottleneck.
Now, let's turn to the effect of defects creating variability in a process.
Consider the following example of a 2-step process.
Both steps have a processing time of five minutes.
Both steps also have a probability of 50% of creating a defect.
In that case, the processing time goes up to ten minutes because they require an
extra five minutes of rework. What's the flow rate going to be in this
process? Well, let's think through all the
scenarios that can happen. In the best case, both of them create a good flow
unit. State G here for good flow unit,
And that happens with the probability of 1/4..
And, in this case, we're going to get a flow of one unit every five minutes.
However, it's also possible that the first step messes up, and the second one does
fulfill unit correctly, which has also probability of one over four.
Now, what happens? In this case, the first resource will take
ten minutes to complete the work. However, the second resource is done after
five minutes and is then out of work. We say that this process that is starved
at resource two. Resource two is starved of work,
And the flow rate goes down to one over ten.
Now, the opposites can happen of course as well.
The second step messes up, The first one operates correctly.
Again, what happens was a probability of one-fourth and, in this case, you notice
that the first step doesn't have any place to put the flow unit,
Right? Because these guys are still, not working on the flow unit.
And so, we say, in this case, the first resource is blocked.
And then, finally, of course, if both of them mess up, which is also happening with
the probability of one-fourth, I also want to get a process comp, the process
completed every ten minutes. So, you notice that on average, the flow
rate is dramatically slower than one unit every five minutes.
That is because of the variability. If you want to avoid blocking and
starving, we can put a buffer between the first step and the second step.
This gets us back to the idea of buffer or suffer.
The bigger the buffer, then the lower the likelihood that the first step gets
blocked and the second step gets stopped. So, inventory protects us from
variability, and thus helps us to obtain a good flow rate.
Consider the following metaphor from the Lean Literature.
Think about inventory as water in a river. They're operating a canal or boat on a
canal. And, in the river, unfortunately, there
are a bunch of rocks. These rocks correspond to all the hiccups
that can happen in a process, such as defects, setup time, and other
complications. Now, on the one hand, we as the operators
of this boat, we prefer not to bump into these rocks.
And so, we have a new center to put a lot of water in this river so that we get high
above the rocks. The problem however, is that now we have
so much water in the river, we never see these rocks.
These rocks are not exposed and they stay in the river forever..
The same applies to inventory. If you put so much inventory into the
system, then we buffer every eventuality out of the way, we start taking the
pressure of process improvement. So, the Toyota production system and Lean
Literature argues exactly the opposite. Instead of buffering systems, we should
reduce the buffers so that we expose problems.
You should do this gradually, instead of taking all the water out of this river at
once. But, gradually step by step, we'll reduce
the water level and expose the most significant rock.
Once we've identified the rock, we go after the underlying root cause,
We get rid of the rock. And again, start [unknown] the water
level. So, as you notice that this creates a
certain paradox. On the one hand, you need some inventory
in the process just to lubricate the flow and to deal with the buffer or suffer
paradox. But, on the other hand, too much inventory
in the system, and people start slacking off.
They have no reason anymore to work hard or to improve the process any further.
To control the amount of inventory in the process, the Toyota production system has
developed a concept of Kanban Cards. Imagine we're selling these beautiful
little black boxes to our customer, And the customer demand, the end customer
demand is here on the right. We have an internal supplier, and other
departments that supplies us with containers that include nine of these
black boxes. Once we have emptied one of these
containers, we're going to take the card or sticker that is attached to the
container, everyone moves that card to this department that is feeding us with
the black boxes. This sticker or this card is called the
Kanban Card.. It's also known as the work authorization
form. Now, folks in the department supplying us,
they have to sit there idle waiting until we provide them with the Kanban Card. Idle
time looks very unproductive, but it's certainly better than producing inventory.
So, only when these guys are going to get our count and count our work
authorizations are they able to produce the next set of nine units.
By definition, the work and process inventory is never bigger than the number
of Kanban Cards. This allow us to keep a cap on inventory
and this allows us to control the inventory level,
Just like with the boat on the river on the previous slide.
In the early days of the process, when there are still defects or other problems,
we issue an extra couple of carts. As the process gets better, we remove some
cards on the system and if we put more pressure on the process improvement.
Now, you notice how this system is implementing a pull system.
Rather than everybody in the process working as hard as they want and pushing
the work forward, it is the demand that drives the system.
Through the Kanban Card,, the system pulls the work through the process as opposed to
pushing it. Buffer or suffer, defects provides a
reason for us to hold inventory. Since we as the managers like a good
continuous flow through the process, it is tempting to us to buffer the flow.
These buffers then avoid that resources either starve or get blocked.
The flow must go on, And so buffers help with the flow.
The problem is, however, That buffers are not particularly lean.
In fact, I would argue they're as unlean as it can get.
And if you remember our discussion from the seven sources of waste from the
productivity module, inventory is really the worst of all the seven sources of
waste. One reason is they make us comfortable
with slack, and they just make it more likely that the operators in the process
just get used to defects as opposed to continuously trying to improve the
process. We saw that Kanban is, for us as managers,
a way to keep a cap on inventory, To control how much inventory can be in
the system. And by setting the right number of
Kanbans, We can adjust the inventory in the process
to our current process capability. The Kanban Processor is based on the idea
of pull. Instead of pushing work into the process,
Kanban implements a pull where machines and operators only get activated if there
is down stream demand from them. Not because of idle time or input