The described process capability Cp, Cpk, Pp, Ppk Issues and resolution approach to process performance reporting provides improved process understanding.
Process Performance Reporting Issues
Process capability or process performance relative to customer desires can be reported using process capability indices (e.g., Cp, Cpk, Pp, Ppk). Other approaches to describe how a process is performing includes a bar chart, pie chart, red-yellow-green scorecards (or stoplight scorecards), and a table of numbers. For a given process, each of these reporting methods can provide a very different, and somewhat subjective, picture of how a process is performing and whether any actions should be taken or not.1
In addition, these reports describe historically what happened, which may not be representative of the future. What is really desired is a futuristic statement about what is expected from the current process state so that appropriate adjustments can be made if what is “seen out the windshield” is undesirable.
Metric reporting should lead to the most appropriate action or non-action; however, often process-metric decisions are a function of how an individual or a group chose its process sampling, data analysis, and reporting procedures. From a conceptual Measurement Systems Analysis (MSA) point of view, the reporting of process performance should be independent of the person who is doing the sampling and reporting. What would be desirable is a predictive system where the only difference between individual-process reporting, in a particular time frame, would be from chance sampling variability.
In the next section of this paper, I will elaborate more on the magnitude of the issue with focus given to illustrating how Cp, Cpk, Pp and Ppk process capability indices’ reported results can be sensitive to how a given process is sampled; i.e., a conceptual MSA issue. A predictive metric reporting system will then be described for overcoming not only the issues with process capability indices reporting but business-performance scorecards in general.
Organizations can benefit when managers utilize the described predictive measurement reporting system throughout their business functional process map.2 Practitioners can enhance the understanding of the benefits of this system when providing illustrative report-outs that compare current scorecard metric reporting to this predictive-performance metric reporting system.
Process Capability Indices (Cp, Cpk, Pp and Ppk) Issues and Resolution
The process capability index Cp represents the allowable tolerance interval spread in relation to the actual spread of the data when the data follow a normal distribution. This equation is
where USL and LSL are the upper specification limit and lower specification limit, respectively, and the spread of the distribution is described as six times standard deviation; i.e., 6σ.
Cp addresses only the spread of the process; Cpk is used to address the spread and mean (μ) shift of the process concurrently. Mathematically, Cpk can be represented as the minimum value of the two quantities
Pp and Ppk indices are sometimes referred to as long-term capability or performance indices. The relationship between Pp and Ppk is similar to that between Cp and Cpk. The index Pp represents the allowable tolerance spread relative to the actual spread of the data when the data follow a normal distribution. This equation is
where USL and LSL are the upper specification limit and lower specification limit. No quantification for data centering is described within this Pprelationship.
Mathematically, Ppk can be represented as the minimum value of the two quantities
Let’s consider the confusion encountered with regard to the calculation of the seemingly simple statistic, standard deviation. Although standard deviation is an integral part of the calculation of process capability, it seems to me that the method used to calculate the value is rarely adequately scrutinized.
In some cases, it is impossible to get a specific desired result if data are not collected in the appropriate fashion. Consider the following three sources of continuous data:
- Situation 1. An and R control chart with subgroups of sample size of 5.
- Situation 2. An X chart with individual measurements.
- Situation 3. A random sample of measurements from a population.
For these three situations, Cp, Cpk, Pp and Ppk, a standard deviation estimate () is determined through the relationships shown in Table 1:
* Statistical computer programs will sometimes pool standard deviations for un-biasing reasons when there are m subgroups of sample size n, resulting in a slightly different value for standard deviation.
Table 1:Cp, Cpk, Pp and Ppk relationships
In the Table, is the overall sample mean, xi is an individual sample (i) from a total sample size N, is the mean subgroup range, is the mean range between adjacent subgroups, N is the total sample size, and d2 is a factor for constructing variables control charts; e.g., d2 equals 1.128 for a two-observation sample and 2.326 for a five-operation sample.
The following example data set will be used to illustrate the impact that different data collection techniques can have on reported process capability metrics.
Table 2: Process Time-series Data
When reporting process capability indices, it is important that the data from which the metric is calculated is from a stable process; i.e., the process is in control. These data were used in another article X-bar and R Control Chart: Issues and Resolution to compare traditional and R process stability assessment to high-level 30,000-foot-level operational-metric reporting. In this x-bar and R control chart article, a traditional control chart indicated that the process was out of control, while 30,000-foot-level reporting indicated that the process was in control. Described in this article were the advantages of a 30,000-foot-level assessment when compared to traditional reporting. The following discussion will presume that the process is considered stable.
To quantify the capability of this process, someone could have chosen to select only one sample instead of five for each subgroup. These two scenarios would result in the following standard deviation calculations:
(Consider that sample one in the above table was the individual reading for each subgroup)
For a specification of 95 – 105, a statistical analysis program used similar standard deviations when determining the process capability results, as shown in Figure 1 and 2.
Figure 1: Process Capability for Five-sample subgroup
Figure 2: Process Capability for One-sample subgroup
Table 3 summaries the process capability results shown in Figure 1 and 2.
Table 3: Summary of Cp, Cpk, Pp and Ppk Values from the Analyses
From this table, we note a large difference between the Cp and Cpk values for a subgrouping sample size of one sample versus five. An examination of the standard deviation equations provides the reason for the large difference between the two sampling plans. The reason for this disparity is that Cp and Cpk calculations, which used an and R chart, had their equation’s standard deviation determined by averaging within subgroups, while for the individuals chart, standard deviation was calculated between subgroups.
With a good conceptual MSA system, process sampling plans should have no effect on process performance statements. Because of the differences noted above, we can conclude, in general, that process capability reporting can have MSA issues since a sample of five versus one did not provide similar answers; i.e., differences being only the result of luck-of-the-draw sampling.
One might note that, in this analysis, Pp and Ppk are similar for the two sampling procedures. However, as was shown in X-bar and R Control Chart: Issues and Resolution, an and R control chart analysis would indicate that the process was out of control; hence, a process capability analysis would not be appropriate for this form of control-charting analysis. This reference also notes technical reasons why individuals control charting is preferred over and R control charting.
Other conceptual MSA issues with process capability indices reporting include:
- If data are not normally distributed, the above equations are not valid.
- The physical implication of reported process capability indices is uncertain and possibly wrong.
- Without an accompanying statement of process stability, from a control chart, all process capability indices are of a questionable value. Any process capability assessment of an unstable process is improper and often deceptive.
- Process capability indices do not provide a predictive performance statement.
Next will be described is a predictive performance metric reporting methodology that addresses these issues.
Predictive Performance Metric Reporting Alternative to Cp, Cpk, Pp, and Ppk
From a conceptual MSA point of view, there are three questions that should be addressed during statistical business performance charting (SBPC), or 30,000-foot-level2 tracking, and reporting for both transactional and manufacturing process outputs. These questions are:
- Is the process unstable or did something out of the ordinary occur, which requires action or no action?
- Is the process stable and meeting internal and external customer needs? If so, no action is required.
- Is the process stable but does not meet internal and external customer needs? If so, process improvement efforts are needed.
Process performance reporting using process capability indices, bar charts, pie charts, red-yellow-green scorecards, or a table of numbers can provide very differing process performance assessments, a conceptual MSA issue, and, in addition, does not structurally address the three-described action options.2
The following illustrates a system for describing a process output performance from a high-level airplane-in-flight view or a 30,000-foot-level. For this SBPC reporting, an individuals control chart subgrouping frequency is made so that typical variability from input variables occurs between subgroups.
Data from regions of stability can be used to estimate the non-conformance rate of a process during those timeframes. If there is a recent region of stability, data from this region can be considered a random sample of the future, from which a prediction statement can be made. This prediction statement presumes that no fundamental positive or negative changes will occur in the future, relative to the process inputs or its execution steps.
If, at some point in time, the output of a stable process is performing at an undesirable non-conformance level, an organization can initiate an improvement project (e.g., Lean Six Sigma project) with the intent to change process inputs or steps to improve a process performance level.
For continuous data, a probability plot can provide an estimate of the process non-conformance rate in either percentage or dpmo (defect per million opportunity) units. For attribute data, the process-estimated non-conformance rate is simply the overall combined subgroup failure rates in the region of process stability.
Figure 3 illustrates the 30,000-foot-level charting of the data shown in Table 1. It is important to note how the estimated non-compliance rate of 26.852% reported in Figure 3 is similar to the PPM total rate of 268525.98 reported in Figure 1.
Figure 3: 30,000-foot-level Chart of Data from Table 13
Summary: Process Capability Cp, Cpk, Pp, Ppk Issues and Resolution
When reporting how a process is performing using capability indices (Cp, Cpk, Pp, Ppk), the magnitude of the reported metrics for a given situation can be a function of sampling procedures. For example, different conclusions could be made when process data are analyzed from an individuals chart report-out (one sample per subgroup) versus a and R chart reporting (multiple samples per subgroup); i.e., a conceptual process performance MSA issue.
Traditional organizational performance measurement reporting systems can utilize a table of numbers, stacked bar charts, pie charts, and red-yellow-green-goal-based scorecards. For a given situation, one person may choose one reporting scheme, while another uses a completely different approach. These differences can lead to a different conclusion about what is happening and should be done.
In addition, the described traditional reporting methods provide only an assessment of historical data and make no predictive statements. Using this form of metric reporting to run a business is not unlike driving a car by only looking at the rearview mirror, a dangerous practice.
When a predictive 30,000-foot-level charting system is used to track interconnected business process map functions, an alternative forward-looking dashboard performance-reporting system becomes available. With this 30,000-foot-level metric system, organizations can systematically evaluate future expected performance and make appropriate adjustments if they don’t like what they see, not unlike looking out a car’s windshield and turning the steering wheel or applying the brake when adjustments are needed.
Business Benefiting from Application of 30,000-foot-level Predictive Performance Metric Reporting
Organizations benefit when 30,000-foot-level techniques are integrated within a business system that analytically/innovatively determines strategies with the alignment of improvement projects that positively impact the overall business. Integrated Enterprise Excellence (IEE) provides a system for this integration.
Businesses experience improvements in the financials when they incorporate measurement reporting that dashboards and scorecards that lead the most appropriate behaviors. Traditional dashboards and scorecards can be transitioned to 30,000-foot-level predictive performance metric reporting, as illustrated in the ten illustrations which are available through the article Predictive Performance Dashboard Scorecard Reporting.
- Forrest W. Breyfogle III, Integrated Enterprise Excellence Volume III – Improvement Project Execution: A Management and Black Belt Guide for Going Beyond Lean Six Sigma and the Balanced Scorecard, Bridgeway Books/Citius Publishing, 2008
- Forrest W. Breyfogle III, Integrated Enterprise Excellence Volume II – Business Deployment: A Leaders’ Guide for Going Beyond Lean Six Sigma and the Balanced Scorecard, Bridgeway Books/Citius Publishing, 2008
- Figure created using Enterprise Performance Reporting System (EPRS) Software
The Problem of Long-Term Capability
Poor labels lead to incorrect ideas
Published: Monday, July 8, 2013 - 14:38
Based on some recent inquiries there seems to be some need to review the four capability indexes in common use today. A clear understanding of what each index does, and does not do, is essential to clear thinking and good usage. To see how to use the four indexes, to tell the story contained in your data, and to learn how to avoid a common pitfall, read on.
The four indexes
Four indexes in common use today are the capability ratio, Cp; the performance ratio, Pp; the centered capability ratio, Cpk; and the centered performance ratio, Ppk. The formulas for these four ratios are:
To understand these ratios we need to understand the four components used in their construction. The difference between the specification limits, USL – LSL, is the specified tolerance. It defines the total space available for the process.
The distance to the nearer specification, DNS, is the distance from the average to the nearer specification limit. Operating with an average that is closer to one specification than the other effectively narrows the space available to the process. It is like having a process that is centered within limits that have a specified tolerance = 2 DNS. Thus, the numerator of both the centered capability ratio and the centered performance ratio characterizes the effective space available due to the fact that the process is not centered within the actual specification limits.
Sigma(X) denotes any one of several within-subgroup measures of dispersion. One such measure would be the average of the subgroup ranges divided by the appropriate bias correction factor. Another such measure is the average of the subgroup standard deviation statistics divided by the appropriate bias correction factor. The quantity denoted by 6 Sigma(X) represents the generic space required by a process when that process is operated up to its full potential.
The global standard deviation statistic, s, is the descriptive statistic introduced in every statistics class. Since it is computed using all of the data, it effectively treats the data as one homogeneous group of values. This descriptive statistic is useful for summarizing the past, but if the process is not being operated up to its full potential the changes in the process will tend to inflate this global measure of dispersion. Thus, this measure of dispersion simply describes the past without respect to whether the process has been operated up to its full potential or not. The denominators of 6s define the space used by the process in the past.
A glance at the formulas above will reveal that the only difference between the capability indexes and the corresponding performance indexes is simply which measure of dispersion is used. The performance indexes use the global standard deviation statistic to describe the past. The capability indexes use a within-subgroup measure of dispersion to approximate the process potential. Whenever and wherever this profound difference between these measures of dispersion is not appreciated it is inevitable that capability confusion will follow.
Depending upon what is happening with the underlying process, the four indexes above can be four estimates of one quantity, four estimates of two different quantities, or even four estimates of four different quantities. This variable nature of what these index numbers represent has complicated their interpretation in practice. As a result, many different explanations have been offered. Unfortunately, some of these explanations have been flawed and even misleading.
What the four indexes measure
Using these four components defined above, we see that the capability ratio, Cp, expresses the space available within the specifications as a multiple of the space required by the process when it is centered within the specifications and is operated predictably. It is the space available divided by the space required under the best possible circumstances.
The performance ratio, Pp, expresses the space available within the specifications as a multiple of the space used in the past by this process. If the process has been operated up to its full potential, the space used in the past and the space required by the process will be essentially the same, and the performance ratio will be quite similar to the capability ratio. If the process has not been operated up to its full potential then the space used by the process in the past will always exceed the space required by the process, and the performance ratio will be smaller than the capability ratio. Thus, the agreement between the capability ratio and the performance ratio will characterize the extent to which the process is, or is not, being operated predictably.
The centered capability ratio, Cpk, expresses the effective space available as a multiple of the space required by the process when it is operated predictably at the current average. It is the effective space available divided by the space required. The extent to which the centered capability ratio is smaller than the capability ratio will characterize how far off-center the process is operating.
The centered performance ratio, Ppk, expresses the effective space available as a multiple of the space used by the process in the past. This ratio essentially describes the process as it is, where it is, without any consideration of what the process has the potential to do. The extent to which the centered performance ratio is smaller than the performance ratio is a characterization of how far off-center the process has been operated.
The relationship between these four indexes may be seen in figure 1. There the top tier represents either the actual capability of a process that is operated predictably, or the hypothetical capability of a process that is operated unpredictably. The bottom tier represents the actual performance of a process that is operated unpredictably. The left side represents what happens when the process is centered at the mid-point of the specifications, while the right side takes into account the effect of having an average value that is not centered at the midpoint of the specifications.
Figure 1: How the capability and performance indexes define the gaps between performance and potential
Thus, the top tier of figure 1 is concerned with the process potential, and the bottom tier describes the process performance. As a process is operated ever more closely to its full potential, the values in the bottom tier will move up to be closer to those in the top tier.
The left side implicitly assumes the process is centered within the specifications; the right side takes into account the extent to which the process may be off-center. As a process is operated closer to the center of the specifications the values on the right will move over to be closer to those on the left.
Thus, when a process is operated predictably and on target, the four indexes will be four estimates of the same thing. This will result in the four indexes being close to each other. (Since the indexes are all statistics, they will rarely be exactly the same.)
When a process is operated predictably but is not centered within the specifications, the discrepancy between the right and left sides of figure 1 will quantify the effects of being off-center. With a predictable process, the two indexes on the right side of figure 1 will both estimate the same thing while the two indexes on the left side will be two estimates of another quantity.
When a process is operated unpredictably, the indexes in the bottom row of figure 1 will be smaller than those in the top row, and these discrepancies will quantify the gap due to unpredictable operation.
When a process is operated unpredictably and off-target, the four indexes will represent four different quantities.
Thus, the capability ratio, Cp, is the best-case value, and the centered performance ratio, Ppk, is the worst-case value. The gap between these two values is the opportunity that exists for improving the current process by operating it up to its full potential.
The capability ratio, Cp, approximates what can be done without reengineering the process. If this best-case value is good enough, then the current process can be made to operate in such a way as to meet the process requirements. Experience has repeatedly shown that it is cheaper to learn how to operate the existing process predictably and on-target than it is to try to upgrade or reengineer that process.
Thus, by comparing the four capability and performance indexes you can quickly and easily get some idea about how a process is being operated. How close is it to being operated up to its full potential? Is it being operated on-target? Will it be necessary to reengineer the process, or can it be made to meet the process requirements without the trouble and expense of reengineering?
Figure 2 contains 260 observations from a predictable process. The corresponding average and range chart is shown in figure 3. The specifications for this process are 10.0 ± 3.5.
Figure 2: 260 observations from a predictable process (in subgroups of size 5). Click for larger view.
Figure 3: Average and range chart for figure 2. Click for larger view.
This process has a grand average of 10.15. The specification limits are 6.5 and 13.5. Thus, the distance to nearer specification will be DNS = 13.5 – 10.15 = 3.35. The average range is 4.25. With subgroups of size 5 this latter value results in a value for Sigma(X) of 4.25/2.326 = 1.83. Finally, the global standard deviation statistic is s = 1.847. Thus, the four capability and performance ratios are:
Here all four indexes tell the same story. They all might be taken to be estimates of the same quantity. Even without the average and range chart of figure 3 we could tell that this process was being operated predictably and is fairly well-centered within the specifications. The fact that these indexes are all near 60 percent implies that this process is not capable of meeting the specifications even though it is being operated up to its full potential.
Raw materials for a compound are dry-mixed in a pharmaceutical blender. The recipe calls for batches that are supposed to weigh 1,000 kg. If the weight of a batch is off, then presumably the recipe is also off. As each batch is dumped out of the blender the weight is recorded. Figure 4 shows the weights of all 259 batches produced during one week. The values are in time-order by rows. The XmR chart for these values is shown in figure 5. The limits shown were based on the first 45 values. There are points outside the limits within this baseline period, and the process deteriorates as the week progresses.
Figure 4: Batch weights in kilograms for 259 batches of a compound. Click for larger view.
Figure 5: XmR chart for the batch weight data. Click for larger view.
The specifications for the batch weights are 900 kg. to 1,100 kg. With an average moving range of 27.84 the value for Sigma(X) is 27.84/1.128 = 24.7 kg. The global standard deviation statistic for all 259 values is s = 61.3 kg. With an average of 936.9, the DNS value is 36.9 kg. Thus, the four indexes are:
The discrepancy between the capability ratio and the performance ratio shows that this process is being operated unpredictably. The discrepancy between the centered performance ratio and the performance ratio shows that the average is not centered within the specifications. The capability ratio describes what the current process is capable of doing when operated predictably and on target. The centered performance ratio describes the train wreck of what they actually accomplished during this week, and the gap between these two indexes describes the opportunity that exists for this process.
As shown in these examples, each of the four index numbers makes a specific comparison between the specified tolerance or the effective space available and either the within-subgroup variation or the global standard deviation statistic. In an effort to distinguish between the capability indexes and the performance indexes the performance indexes have sometimes been called “long-term capability indexes.” This nomenclature is misleading and inappropriate.
The idea behind the terminology of long-term capability is that if you just collect enough data over a long enough period of time you will end up with a good estimate of the process capability. To illustrate how this is supposed to work we will use data from example one to perform a sequence of computations using successively more and more data at each step. Although we would not normally perform the computations in this way in practice, we do so here to see how increasing amounts of data affect the computation of performance and capability ratios.
We begin with the first eight subgroups. The global standard deviation statistic for these 40 values is 1.974. The specifications are 6.5 to 13.5, so our USL – LSL = 7.0. Using these values we get a performance ratio of 0.591. The average range for these eight subgroups is 4.375, so Sigma(X) is 1.881, and with this value we get a capability ratio of 0.620. It is instructive to note how close these values are to the values found using all the data in example one above.
The first 12 subgroups contain 60 values. The global standard deviation statistic for these 60 values is 1.742. Using this value we get a performance ratio of 0.670. The average range for these 12 subgroups is 3.833, so Sigma(X) is 1.648, and with this value we get a capability ratio of 0.708.
The first 16 subgroups contain 80 values. The global standard deviation statistic for these 80 values is 1.678. Using this value we get a performance ratio of 0.691. The average range for these 16 subgroups is 3.875, so Sigma(X) is 1.666, and with this value we get a capability ratio of 0.700.
Continuing in this manner, adding 20 more values at each step, we get the performance ratios and capability ratios shown in figure 6. There we see that as we use greater amounts of data in the calculations these ratios settle down and get closer and closer to a value near 0.640.
Figure 6: Performance and capability ratios for example one converge with increasing amounts of data. Click for larger view.
Of course, as may be seen above, when a process is operated predictably, the capability ratio and the performance ratio both estimate the same quantity. Thus, when a process is operated up to its full potential there is no distinction to be made between the short-term capability and the long-term capability. Both computations describe the actual capability of the predictable process.
The convergence of a statistic to some asymptotic value that occurs with increasing amounts of data that is seen in figure 6 is the idea behind many things we do in statistics. Unfortunately, this convergence only happens when the data are homogeneous. In order to see what happens with a process that is not operated up to its full potential, we shall repeat the exercise above using the data from example two.
The first 40 batch weights have a global standard deviation statistic of 41.60. The specifications are 900 to 1,100, so our specified tolerance is USL – LSL = 200. Using these values we get a performance ratio of 0.801. The average moving range for these 40 values is 29.10, so Sigma(X) is 25.80, and with this value we get a capability ratio of 1.292.
The first 60 batch weights have a global standard deviation statistic of 44.20. Using this value we get a performance ratio of 0.754. The average moving range for these 40 values is 25.76 so Sigma(X) is 22.84, and with this value we get a capability ratio of 1.459.
Continuing in this manner, adding 20 more values at each step, we get the performance ratios and capability ratios shown in figure 7. For the sake of comparison, both figure 6 and figure 7 use the same horizontal and vertical scales.
Figure 7: Neither the performance ratio nor the capability ratio for example two settles down to some fixed value with increasing amounts of data. Click for larger view.
To what value is the performance ratio curve in figure 7 converging? After 120 values it appears to be approaching 0.80, then with 20 additional values it suddenly drops down to the neighborhood of 0.70. After 180 values it seems to be approaching 0.70, then with 20 more values it drops down to the neighborhood of 0.60. After 240 values we are still in the vicinity of 0.60, but then with 259 values we drop down to 0.54. So which value are you going to use as your long-term capability? 0.80? 0.70? 0.60? or 0.54?
Here we see that even though we use ever greater amounts of data, the ratios do not settle down to any particular value. Neither do we see the agreement between the performance ratio and the capability ratio that was evident in figure 6. Clearly these two ratios characterize different aspects of the data in this case. Both the migration and the estimation of different things happen because this process is changing over time. Because of these changes there is no magic amount of data that will result in a “good number.” The computations are chasing a moving target. The question “What is the long-term capability of this process?” is meaningless simply because there is no such quantity to be estimated regardless of how many data we might use.
With an unpredictable process, as we use greater amounts of data in our computation we eventually combine values that were obtained while the process was acting differently. This combination of unlike values does not prevent us from computing our summary statistics, but it does complicate the interpretation of those statistics. With an unpredictable process there is no single value for the process average, or the process variation, or the process capability. All such notions of process characteristics become chimeras, and any attempt to use our statistics to estimate these nonexistent process characteristics is an exercise in frustration. This is why the idea of long-term capability is just so much nonsense.
However, once we understand that we are working with an unpredictable process, we are free to use our statistics to characterize different aspects of the data (as opposed to the process). As noted earlier, the capability ratio of 1.35 computed from the first 45 values of example two provides an approximation of what this process has the potential to do. In the same manner, the centered performance ratio of 0.20 describes what was done during this week. And the difference between these two statistics characterizes the gap between performance and potential. Thus, we may use the capability and performance indexes to identify opportunities even when they do not estimate fixed aspects of the underlying process.
Thus, referring to the performance indexes as long-term capabilities confuses the issue and misleads everyone. They are descriptive statistics that summarize the past. They do not estimate any fixed quantity unless the process is being operated predictably. And they definitely do not describe the indescribable “long-term capability of an unpredictable process.”
About The Author
Donald J. Wheeler
Dr. Donald J. Wheeler is a Fellow of both the American Statistical Association and the American Society for Quality, and is the recipient of the 2010 Deming Medal. As the author of 25 books and hundreds of articles, he is one of the leading authorities on statistical process control and applied data analysis. Find out more about Dr. Wheeler’s books at www.spcpress.com.
Dr. Wheeler welcomes your questions. You can contact him at email@example.com.