How is OEE calculated?

OEE (Overall Equipment Effectiveness) is calculated as Availability × Performance × Quality. It measures the percentage of manufacturing time that is truly productive against full potential during the periods when equipment is scheduled to run.

Where does the 85% world-class OEE benchmark come from?

The 85% benchmark is the mathematical product of three component targets proposed by Seiichi Nakajima in 1988: Availability ≥ 90%, Performance ≥ 95%, Quality ≥ 99%. Multiplied together (0.90 × 0.95 × 0.99 = 0.84645) this rounds to 85%. It is not an empirical statistic from a study of high-performing plants — it is a derived target.

What is PPLH (Parts Per Labour Hour)?

PPLH is a productivity ratio measuring how many good parts are produced per labour hour worked. It is calculated as Total Good Parts Produced ÷ Total Direct Labour Hours and is used in labour-intensive assembly environments.

What is the difference between Cp, Cpk, Pp, and Ppk?

Cp and Cpk are capability indices that use within-subgroup variation; Pp and Ppk are performance indices that use overall variation. Cpk and Ppk account for process centering relative to the specification limits; Cp and Pp do not. Cpk is the most common day-to-day index.

What is an MES (Manufacturing Execution System)?

A Manufacturing Execution System is software that connects to production equipment and personnel to collect real-time data, track production against schedules, and report on OEE, downtime, and quality. It operates between the ERP layer and the shop floor.

What is IIoT (Industrial Internet of Things)?

IIoT refers to the network of connected industrial devices, sensors, and machines that share data over digital networks. It is the connectivity layer that lets production equipment send data to software systems for monitoring, analysis, and control.

What is MTBF (Mean Time Between Failures)?

MTBF is the average time between unplanned equipment failures. It is a key indicator of equipment reliability and is used in maintenance planning and capital decisions.

What is MTTR (Mean Time to Repair)?

MTTR is the average time required to restore equipment to operation after a failure. It is a key indicator of maintenance responsiveness and is tracked alongside MTBF.

What is the difference between cycle time, takt time, lead time, and throughput time?

Cycle Time is the measured time per cycle at a machine. Takt Time is the pace of customer demand (Available Production Time ÷ Customer Demand). Lead Time is order-to-delivery, customer-facing. Throughput Time is raw-material-in to finished-product-out, internal.

What is the Duty Relief Program (DRP)?

The Duty Relief Program is a Canada Border Services Agency program that can allow eligible importers to bring goods into Canada without paying duties when those goods are later exported, either as-is or after being used in production.

Reference

Manufacturing Glossary

Q: What is FTT (First Time Through) in manufacturing?

First Time Through (FTT) is the percentage of units that pass all quality checks and process steps without rework, repair, or scrap on the first attempt. It is a key KPI in assembly and manufacturing quality programs.

Q: What is a CMMS (Computerized Maintenance Management System)?

A CMMS is software for managing maintenance work orders, preventive maintenance schedules, parts inventory, and equipment history. It is the system of record for maintenance teams and reliability engineers.

Plain-English definitions for the terms manufacturing teams actually use, with the operational nuance most reference sources leave out. Covers OEE and the 85% benchmark, time and stoppage analysis, quality systems, maintenance and reliability, lean methodology, industrial protocols, and Canadian customs compliance.

Production & OEE

OEE (Overall Equipment Effectiveness)

The percentage of manufacturing time that is truly productive. Calculated as Availability × Performance × Quality. OEE measures utilization against full potential during the periods when equipment is scheduled to run.

OEE is the most widely used composite metric in production reporting. It is also the most commonly miscalculated, primarily through inconsistent treatment of the time base. The components must be measured against the same productive time, and non-productive periods (scheduled breaks, planned shutdowns) must be excluded uniformly. See OEE Tracking →

The 85% World-Class OEE Benchmark, and Where It Comes From

The 85% number is the most quoted benchmark in manufacturing operations and one of the most often misinterpreted. It is not an empirical statistic from a survey of high-performing plants. It is the mathematical product of three component targets proposed by Seiichi Nakajima, the engineer credited with formalizing Total Productive Maintenance (TPM) at the Japan Institute of Plant Maintenance, in his 1988 book Introduction to TPM: Total Productive Maintenance (Productivity Press).

Nakajima proposed the following thresholds as ambitious but achievable targets for a well-run discrete manufacturing operation:

Availability ≥ 90%
Performance ≥ 95%
Quality ≥ 99%

Multiplied together, these produce 0.90 × 0.95 × 0.99 = 0.84645, which rounds to 85%. That is the entire derivation. There was no empirical study and no industry survey. The number is the arithmetic product of three component targets.

Three implications follow from this that the benchmark's popular use tends to obscure.

An 85% OEE result is only meaningful if the components are balanced. A plant reaching 85% through (for example) 100% Availability × 90% Performance × 94% Quality is producing the right number, but the underlying performance is not what Nakajima described. Many plants reach 85% by being weak on one component and compensating with another. The math is satisfied; the operational result is not.

The "world-class" label depends on industry and process type. Continuous-process industries such as chemicals, fluids, and gases routinely exceed 90% OEE because they do not experience changeovers. Discrete and high-mix plants operating at 65% may be performing better, relative to their constraints, than a continuous-process plant at 90%. The 85% benchmark is not portable across industry types without context.

Nakajima estimated industry-typical OEE at approximately 60% in his era. He was not proposing 85% as a median performance level for industry. He proposed it as a stretch target that required balanced execution across all three components.

The right follow-up to a stated 85% OEE target is not whether it is achievable. It is whether the underlying component targets are balanced and whether the process type makes 85% the right ceiling.

Availability

The percentage of scheduled time that equipment is actually running, rather than down for unplanned or planned stops. One of the three components of OEE.

Performance

The speed at which equipment runs relative to its designed maximum speed. A machine running slower than its ideal cycle time has reduced Performance. One of the three components of OEE.

Quality Rate

The percentage of units produced that meet quality standards without rework. One of the three components of OEE.

The Six Big Losses

The Six Big Losses framework was published by Seiichi Nakajima alongside OEE in his 1988 Introduction to TPM. It is the breakdown most commonly used to attribute OEE losses back to specific operational causes. Each loss maps to one of the three OEE components.

Availability losses

Equipment Failure (also called breakdowns). Unplanned stops caused by a machine failing while it was expected to be running, anywhere from a motor burning out to a sensor jam.
Setup and Adjustment. Time lost during planned changeovers, including setup, alignment, and the validation period before stable production resumes.

Performance losses

Idling and Minor Stops. Brief stops, typically under five minutes, that do not warrant a downtime code but accumulate substantially over a shift. Common causes include sensor false triggers, parts not seated correctly, and small jams.
Reduced Speed. Time during which the machine is running but cycling slower than the ideal cycle time. Common causes include tooling wear, material variation, and operator-imposed slowdowns to manage downstream issues.

Quality losses

Process Defects. Scrap and rework produced during stable production conditions, after the line has reached validated output.
Reduced Yield (Startup Losses). Scrap and rework produced during startup, before the line has stabilized. Counted separately from process defects because the cause is fundamentally different.

The framework is useful because it pushes teams away from looking only at the OEE number and toward the specific loss type that is dragging it down. Two plants both running at 70% OEE can have completely different improvement priorities depending on which of the six losses dominates.

OEU (Overall Equipment Utilization)

A variant of OEE that excludes the impact of upstream and downstream constraints on the machine being measured. OEU answers a different question than OEE: not "how well is the line performing," but "how well would this specific machine be performing if it were not held up by the rest of the line?"

Where OEE penalizes a machine for being blocked or starved, OEU treats those uptime events as outside the machine's control and removes them from the calculation. The result tends to be higher than OEE on the same asset.

OEU is the right metric for evaluating the machine itself, for example when deciding whether to replace it, retool it, or change its ideal cycle time. OEE remains the right metric for evaluating the line as a whole.

TEEP (Total Effective Equipment Performance)

OEE measured against calendar time rather than scheduled time. Calculated as OEE × Utilization, where Utilization is the fraction of total calendar time during which the machine was scheduled to run.

A plant running a single shift on weekdays might have 75% OEE during scheduled time and a TEEP of about 18%. The TEEP number is small because the machine is sitting idle for most of the calendar week. TEEP is the right metric when the question is whether to add shifts, run weekends, or invest in additional capacity. OEE answers a question about how well scheduled time is being used; TEEP answers a question about how much capacity the asset has on the table.

Speed Loss

The performance loss caused by a machine cycling more slowly than its ideal cycle time. This is the formal name for one of the two performance components in the Six Big Losses framework. Speed loss is distinct from idling and minor stops, which are very brief stoppages rather than reduced cycling rate.

Speed loss often signals tooling wear, material variation, or operator-imposed slowdowns. It is harder to detect than downtime because the machine appears to be running, but the cumulative effect on output can match or exceed visible downtime.

Idling and Minor Stops

Stoppages too brief to warrant a downtime reason code, typically under five minutes. Causes include sensor false triggers, parts not seated correctly, and micro-jams cleared by the operator without intervention. Each event is small. Accumulated across a shift they often represent more lost time than the visible downtime events that dominate the loss report.

The challenge with idling and minor stops is operational rather than analytical. They tend to be undercounted because operators do not register them as "stops." Capturing them accurately requires either automated detection (the system itself notices the cycle gap) or a dedicated minor-stop category that is faster to enter than a regular reason code.

Reduced Speed Losses

The category that includes both speed loss (the machine is running slower than ideal cycle time) and idling and minor stops (the machine is briefly stopping). Together these form the Performance component of OEE. They are grouped because operationally they look similar from a throughput perspective: the machine is technically up, but it is not delivering the cycles it should.

Best Rate vs Target Rate

A diagnostic comparison between the ideal cycle time configured for a product and the best actual cycle rate the line has achieved on that product over a measurement period.

A value above 100% indicates the line has run faster than its configured ideal rate at some point. The ideal cycle time is too conservative and should be reviewed.
A value near 100% indicates the ideal cycle time is approximately correct.
A value below 100% indicates the ideal cycle time is set faster than the line has ever actually achieved. The configured target may be unreachable, which makes the entire OEE Performance number unreliable.

Best Rate vs Target Rate is one of the most useful diagnostics for validating that an OEE system is configured correctly. A plant whose dashboards show 60% OEE on a line where Best Rate vs Target Rate has been stuck below 90% for months is not under-performing. It is mis-configured.

FTT (First Time Through)

The percentage of units that pass all quality checks and process steps without rework, repair, or scrap on the first attempt. A key KPI in assembly and manufacturing quality programs. See FTT in DQS →

PPLH (Parts Per Labour Hour)

A productivity ratio measuring how many good parts are produced per labour hour worked. Used in labour-intensive assembly environments to track productivity and staffing efficiency.

Andon

A visual alert system originating from the Toyota Production System. Andon alerts notify supervisors and maintenance staff when a machine stops, a quality problem occurs, or production falls behind target. See Andon Alerts →

Downtime

Any period when a machine or production line is not producing. Categorized as planned (scheduled maintenance, changeovers) or unplanned (breakdowns, material shortages). See Downtime Tracking →

Time, Allowances & Reporting

Cycle Time vs Takt Time vs Lead Time vs Throughput Time

Four time metrics that are commonly confused. The differences are operational, not semantic.

Cycle Time is the actual time it takes to complete one cycle of production at a machine or station. It is measured. A press that fires every 12 seconds has a 12-second cycle time.
Ideal Cycle Time is the theoretical fastest cycle time the machine can achieve at a given product. It is configured and used as the basis for OEE Performance calculations.
Takt Time is the pace of customer demand, calculated as Available Production Time ÷ Customer Demand. If customers want 480 parts a day and the line runs eight hours, takt time is 60 seconds. Takt is a target rate, not a measured rate.
Lead Time is the time from a customer placing an order to receiving the finished product. It is customer-facing and includes order queueing, manufacturing, and shipping.
Throughput Time is the time from raw material entering the plant to finished product leaving. It is internal and excludes the order-queue and post-shipping legs of lead time.

The most common mistake is treating cycle time and takt time as interchangeable. They are not. Cycle time describes what the machine does; takt time describes what the customer needs. When cycle time exceeds takt time, the line is behind demand. When cycle time is much faster than takt, the operation is either overproducing or carrying schedule slack.

Setup Time vs Startup Time vs Changeover Time

Changeover is often used as a single word for what is operationally three distinct phases.

Setup Time is the interval from the last cycle of the previous product to the first cycle of the new product. The machine is not producing.
Startup Time (also called validation time) is the interval from the first cycle to the nth validated cycle of the new product. The machine is producing, but production is not yet considered in spec.
Changeover Time is the full event, setup plus startup, from the last cycle of the old product to the nth validated cycle of the new product.

A 30-minute "changeover" might be 20 minutes of setup and 10 minutes of startup. Treating these as a single number hides where the time actually goes, and which one to attack for SMED (Single-Minute Exchange of Die) improvements.

Reporting Day vs Actual Day

An overnight shift starts on one calendar day and ends on the next. Which day owns its production?

Actual Day is the calendar day a time period falls on.
Reporting Day is a logical day used for aggregation. Conventionally, an overnight shift such as Friday 11pm to Saturday 7am is attributed to Friday's Reporting Day rather than split across two calendar days.

This distinction is the most common source of mismatched dashboards and reports. Aggregations using Actual Day will split overnight shifts; aggregations using Reporting Day will not. Pick one convention per system and apply it everywhere.

Productive vs Non-Productive Time

Productive Time is the time a machine is expected to be producing. It is the denominator for OEE.
Non-Productive Time covers scheduled breaks, lunches, plant meetings, and planned shutdowns. It is excluded from OEE when handled correctly.

The trap on either side is allowing the schedule to drift from reality. Treating an unscheduled team huddle as productive time penalizes OEE for time the line was not expected to run. Treating a scheduled break as productive time has the same effect. The fix is keeping the schedule accurate, not adjusting the OEE math.

Load/Unload, Changeover, and Validation Allowances

Allowances are grace periods. They define a configurable interval during which a recognized production loss is excluded from OEE so the team is not penalized for unavoidable activity.

Load/Unload Allowance. For manual load/unload operations, a configurable number of seconds before an "impact-starved" condition starts counting against OEE. Without this, every manual cycle would carry a small starved-condition penalty for the load/unload time itself.
Changeover Allowance. A configurable number of minutes at the start of a changeover that are excluded from OEE, reflecting that some changeover duration is unavoidable. Time beyond the allowance starts counting.
Validation Allowance. A configurable number of minutes at the start of a validated production run, allowing the line to ramp from first piece to validated output without penalty. Time beyond the allowance counts.

Allowances are configuration choices rather than universal rules. Setting them too generously hides real losses. Setting them too tightly penalizes teams for the cost of doing business. The right setting reflects what is unavoidable in the operation, not what is normal.

Stoppage Taxonomy

Stoppage

A general term for any non-cycling event on a machine. In well-instrumented systems, stoppages are classified into three mutually exclusive types: Downtime, Uptime, and Changeover. A fourth category, Machine Faults, logs in parallel to these.

Downtime Event

A machine over-cycle. The machine started cycling but did not complete its operation within the expected time of the ideal cycle. Duration is measured from when the cycle should have completed to when it actually did. Downtime is the classic "machine stopped" event and is distinct from blocked or starved conditions, which are uptime events.

Uptime Event (Blocked / Starved)

The machine itself is functional, but it cannot produce because of an upstream or downstream constraint.

Blocked. The machine has nowhere to place the part it just produced. The downstream queue is full.
Starved. The machine has no material to work on. The upstream feed has stopped.

Uptime events are sometimes called impact-blocked or impact-starved to distinguish them from brief, expected blocked or starved conditions during normal load/unload. Distinguishing uptime events from downtime events is the reason OEU (Overall Equipment Utilization) exists as a separate metric. OEU strips out upstream and downstream constraints to show how the machine itself is performing.

Changeover Event

The interval from the last cycle of the previous product to the nth validated cycle of the new product, captured automatically when a product change is detected. See Setup Time vs Startup Time vs Changeover Time for the breakdown of phases inside a changeover.

Machine Fault

A PLC fault code logged for diagnostic purposes. Machine faults log in parallel to stoppage events. A machine in a changeover or downtime event can simultaneously carry one or more active fault codes. This is why a stoppage report and a fault report rarely reconcile cleanly: they track different layers of the same event. Stoppage reports describe what the machine was doing; fault reports describe what the machine was reporting.

No Demand

A legitimate non-productive period during which the schedule indicates production could run, but there is no work to do. Causes include no orders, upstream feed stopped, or scheduled idle. Operations that do not handle "no demand" correctly penalize teams for conditions outside their control. Most production reporting systems handle this by allowing certain reason codes to mark time as "no demand" rather than as availability loss, preserving the OEE math while reflecting reality on the floor.

Quality & Compliance

SPC (Statistical Process Control)

A method of monitoring and controlling a manufacturing process using control charts. Detects process drift before defects occur. Common chart types include Xbar-R, Xbar-S, and individuals charts. See SPC →

Cp vs Cpk vs Pp vs Ppk

Four process capability indices that get used interchangeably and almost never should be. Two distinctions matter.

Capability (Cp / Cpk) vs Performance (Pp / Ppk). Capability indices use within-subgroup variation only. They tell you what the process is capable of when it is statistically in control. Performance indices use overall variation, including the variation between subgroups over time. They tell you what the process actually did. A process can have an excellent Cpk and a poor Ppk if it drifts between samples.

Centered (Cp / Pp) vs accounting for centering (Cpk / Ppk). Cp and Pp assume the process is perfectly centered between specification limits and tell you whether the spread fits if it were. Cpk and Ppk account for how off-center the process actually is. Cpk is always less than or equal to Cp; the same is true for Ppk and Pp. They are equal only when the process mean is exactly at the target.

Common benchmarks. A Cpk or Ppk of 1.33 is generally accepted as adequate for most discrete manufacturing. 1.67 is the threshold that AIAG PPAP requires for significant characteristics on initial process studies. 2.0 corresponds to a six sigma process. A value below 1.0 indicates the process is not capable of meeting the specification.

The trap that most often appears in supplier reports is quoting Cpk where Ppk was actually required. Cpk reflects what the process can do under controlled conditions. Ppk reflects what the process did across the actual measurement window. PPAP submissions explicitly distinguish the two for a reason.

Track & Trace (Traceability)

The ability to record and retrieve the complete manufacturing history of a specific part or lot — what materials were used, which operator made it, which machine, which quality checks passed. Required in automotive, medical, and food industries. See Track & Trace →

Visual Inspection

A manual or automated quality check using human judgement or machine vision to assess part appearance, dimensions, or assembly correctness. See Visual Inspection →

Audit-Ready

A state where quality records, traceability data, and inspection results are organized and immediately retrievable in a format acceptable for customer or third-party audits such as IATF 16949. See Audit Management →

PPAP (Production Part Approval Process)

A formal supplier submission process published by AIAG and required by most automotive OEMs (Ford, GM, Stellantis, and the Tier 1 suppliers that serve them). PPAP defines the evidence a supplier must provide to demonstrate that their production process can consistently produce parts meeting all specifications, before mass production begins or after any significant process change.

A standard PPAP package contains 18 to 19 elements, including the Design FMEA, Process FMEA, Process Flow Diagram, Control Plan, Measurement System Analysis, dimensional results, initial process capability studies (typically Ppk), and a Part Submission Warrant (PSW) signed by the supplier and approved by the customer.

PPAP submissions are graded by level. Level 1 is a warrant only. Level 3 (warrant plus full supporting data) is the most common default. Level 5 requires the customer to review the data on-site at the supplier. Level requirements are set by the customer and vary by part criticality.

PPAP is automotive in origin. Aerospace has its own equivalent built around AS9102 First Article Inspection.

APQP (Advanced Product Quality Planning)

The five-phase product development framework that produces the deliverables that eventually become a PPAP package. Also published by AIAG.

Phase 1: Plan and Define Program. Phase 2: Product Design and Development. Phase 3: Process Design and Development. Phase 4: Product and Process Validation. Phase 5: Feedback, Assessment, and Corrective Action.

APQP and PPAP are often confused. APQP is the up-front planning and development process. PPAP is the formal submission of evidence at the end of APQP that the process is ready to produce.

FMEA (Failure Mode and Effects Analysis)

A systematic method for identifying potential failure modes in a product or process, evaluating their effects, and prioritizing them for action. Two main types are used in manufacturing.

DFMEA (Design FMEA) focuses on the product design. What could fail about the design itself, what would the consequence be, and how would the design need to change.

PFMEA (Process FMEA) focuses on the manufacturing process. What could go wrong in production, where the controls are weakest, and what process changes would reduce the risk.

The traditional FMEA approach scored each failure mode by Severity (S), Occurrence (O), and Detection (D), each on a 1-to-10 scale. The product (S × O × D) was the Risk Priority Number (RPN), used to prioritize action.

The 2019 AIAG-VDA FMEA Handbook replaced RPN with Action Priority (AP) categories of High, Medium, and Low, based on specific combinations of S, O, and D rather than their multiplicative product. The change reflects criticism that RPN allowed low-severity, high-frequency issues to outrank high-severity, low-frequency ones.

Gauge R&R (Repeatability and Reproducibility)

A study that determines how much of the observed variation in a measurement is caused by the measurement system rather than the parts being measured.

Repeatability (sometimes called Equipment Variation) is the variation observed when the same operator measures the same part multiple times with the same gauge. It reflects gauge precision.

Reproducibility (sometimes called Appraiser Variation) is the variation observed between different operators measuring the same part with the same gauge. It reflects how operator-dependent the measurement is.

A Gauge R&R study reports the combined contribution of these two sources as a percentage of total variation or as a percentage of the specification tolerance. Common acceptance thresholds are below 10% (acceptable), 10 to 30% (marginal, sometimes acceptable depending on the criticality of the application), and above 30% (unacceptable).

A high Gauge R&R is an important early signal that quality decisions made downstream are unreliable. If 25% of the variation in a measurement is the measurement system, the part-level capability numbers built on top of that data are inflated.

MSA (Measurement System Analysis)

The umbrella term for studies that evaluate measurement system performance. Gauge R&R is one type of MSA, focused on variation. MSA also covers bias (systematic offset between the measured value and the true value), linearity (whether bias varies across the measurement range), stability (whether the system drifts over time), and resolution (whether the gauge can detect the variation that matters for the specification). AIAG publishes the MSA reference manual as one of the core automotive quality tools alongside FMEA, SPC, PPAP, and APQP. ISO/TS 16949 and IATF 16949 require MSA for any measurement system used to make quality decisions.

8D Report

An eight-step problem-solving methodology developed at Ford in the 1980s and now widely used in automotive and adjacent industries. The 8D report is the formal document produced when applying the method, typically in response to a customer quality complaint.

D1 establishes the team. D2 describes the problem in measurable terms. D3 implements interim containment to protect the customer while root cause is being investigated. D4 identifies and verifies root causes. D5 selects and verifies permanent corrective actions. D6 implements those actions. D7 prevents recurrence by identifying and correcting the systemic conditions that allowed the problem. D8 recognizes the team's contribution.

Many customers (Tier 1 and OEM automotive in particular) require an 8D submission within a defined window after a quality issue. A weak 8D is one that confuses containment with corrective action. Containment is what stops the bleed in the short term. Corrective action is what changes the system so the same problem cannot happen again.

5 Whys

A root-cause analysis technique attributed to Sakichi Toyoda and central to the Toyota Production System. Ask "why" repeatedly to move from a visible symptom to an underlying systemic cause.

The number five is heuristic. Sometimes three iterations are enough. Sometimes the chain runs longer. The goal is to keep asking until the answer points to a process or system condition rather than to an individual's behaviour. "The operator forgot to torque the bolt" is rarely the root cause. "The work instruction allowed the torque step to be skipped without verification" is closer.

The 5 Whys is most useful for problems with a clear causal chain. It is a poor fit for problems with multiple interacting causes, where a fishbone diagram or formal FMEA produces better results.

NCR (Non-Conformance Report)

A documented record of a deviation from a specified requirement. Three terms are used in this space and are not interchangeable.

A non-conformance (or non-conformity) is the underlying event: material out of specification, a process step skipped, a finished part outside dimensional tolerance, a missing record.

A defect is a non-conformance that affects the function, performance, or safety of the product. All defects are non-conformances. Not all non-conformances are defects.

An NCR is the document used to record and track a non-conformance through containment, disposition, and resolution.

ISO 9001 and IATF 16949 both require that every non-conformance be recorded, dispositioned, and trended over time so that recurring issues become visible.

CAPA (Corrective and Preventive Action)

The systematic process for responding to non-conformances after they are found.

Corrective action addresses a non-conformance that has already occurred, with the goal of preventing it from recurring. The action targets the root cause, not the symptom.

Preventive action addresses a potential non-conformance that has not yet occurred but has been identified through risk analysis or trend monitoring.

CAPA is heavily formalized in regulated industries. ISO 13485 (medical devices) and 21 CFR Part 820 (FDA) require documented CAPA processes with defined timelines and effectiveness verification. ISO 9001:2015 effectively replaced standalone "preventive action" with broader risk-based thinking, though many organizations retain the CAPA structure for consistency with their other certifications.

The most common CAPA failure mode is conflating containment with corrective action. Sorting suspect material is containment. Changing the work instruction so the suspect material cannot be produced again is corrective action. Closing a CAPA on the basis of containment alone leaves the system unchanged.

FAI (First Article Inspection)

A formal inspection of the first part produced under new tooling, a new process, or a significant process change. The inspection verifies that the production process actually produces parts meeting every dimensional and functional requirement on the drawing.

FAI is the standard process-validation step in aerospace, where it is governed by AS9102. It is also used in defense manufacturing and increasingly in supplemental automotive processes that fall outside PPAP.

FAI and PPAP are often confused. PPAP is a comprehensive package that demonstrates an entire process is ready to produce in volume, including capability studies, MSA, FMEAs, and a control plan. FAI is specifically a dimensional and feature-by-feature verification of the first piece off the new process. A PPAP submission typically includes FAI-style dimensional results as one of its 18 elements; FAI on its own does not satisfy PPAP.

Process Variable (PV)

A measurable characteristic of a manufacturing process that is captured for control, monitoring, or quality verification. Common examples include torque, pressure, temperature, dimension, weight, time, and count. The term originates in process control theory, where a process variable is the measured value being controlled in a feedback loop. In manufacturing reporting it is used more broadly to mean any data point captured during production, whether for SPC, traceability, quality verification, or operator confirmation.

Process variables fall into three operational categories.

Numeric (continuous) variables hold ranged measurements such as a temperature reading of 425.3°F or a torque value of 18.2 Nm. They typically carry specification limits (USL and LSL), control limits (UCL and LCL), and a target value, and they are the basis for SPC analysis.

Binary or categorical variables hold pass/fail, OK/NOK, or yes/no outcomes. They drive go/no-go decisions and are used for verification checks rather than statistical analysis.

Identifier variables hold values such as serial numbers, lot numbers, or work order numbers. They serve as keys for traceability rather than as quality measurements in their own right.

The term carries a useful distinction. "Process variable" in casual use means any measurable. In a structured quality system, a process variable specifically refers to a configured data point with an associated capture rule, units, decimal precision, and a defined set of limits. See Process Variable Collection →

Quality Alert / Quality Alert with Lockout

A digital notification displayed at a workstation that requires operator acknowledgement and is logged with operator ID and timestamp. Common uses include communicating containment notices on suspect material, alerting operators to a known issue or work-around, confirming a temporary specification change has been read, and verifying that updated work instructions have been received before production continues.

Two variations are typically distinguished.

A Quality Alert requires acknowledgement but does not interrupt production. The operator clicks through a confirmation screen, the acknowledgement is recorded, and the line continues running.

A Quality Alert with Lockout requires acknowledgement before the machine will continue to produce. The lockout is used when the alert carries safety, regulatory, or quality weight that makes acknowledgement non-optional.

The mechanism replaces email-based or paper-based quality communication with a digital, traceable, enforceable record. Audits can confirm which operators were exposed to which alerts and when. Investigations can verify whether a specific alert was active during the production of a specific lot.

Maintenance & Reliability

CMMS (Computerized Maintenance Management System)

Software for managing maintenance work orders, preventive maintenance schedules, parts inventory, and equipment history. See 10in6 CMMS →

Predictive vs Preventive vs Reactive Maintenance

The three principal maintenance strategies. The differences lie in what triggers the work and how much is known about the equipment's condition before it begins.

Reactive maintenance is performed after a failure has occurred. When applied as a deliberate strategy on appropriate equipment, it is referred to more precisely as run-to-failure. When it is the result of neglect, it has the same effect with none of the intent.

Preventive maintenance (PM) is scheduled work based on time intervals or cycle counts. Every 500 operating hours, every six months, every 100,000 cycles. The work is performed regardless of the equipment's actual condition. PM trades the cost of unnecessary maintenance for the avoidance of failures during the interval. The risk is that failures occur between intervals, or that the maintenance intervention itself introduces new faults.

Predictive maintenance (PdM) is triggered by condition data indicating impending failure. The work is performed when monitoring signals (vibration, temperature, lubricant condition, current draw, acoustic emission) cross thresholds that historical data associates with degradation. PdM is more efficient than PM when the failure modes have measurable precursors, and more cost-effective than reactive maintenance for any failure with significant consequences.

A fourth category, prescriptive maintenance, has emerged with broader sensor coverage and modeling. Prescriptive systems go beyond predicting failure to recommending a specific course of action. The boundary between predictive and prescriptive is currently a matter of vendor positioning more than methodology.

Most plants run a mix of all three. The decision of which strategy fits which asset is the central question that Reliability-Centered Maintenance is designed to answer.

Preventive Maintenance (PM)

Scheduled maintenance performed at regular intervals — by time, cycles, or condition — to prevent equipment failure before it occurs. See Maintenance Checks →

Condition-Based Monitoring (CBM)

The continuous or periodic measurement of equipment health indicators to detect changes that signal degradation or impending failure. Common CBM signals include vibration analysis, thermography, oil and lubricant analysis, ultrasonic testing, motor current signature analysis, and acoustic emission monitoring.

CBM is the measurement layer underneath predictive maintenance. CBM data describes the current state of the equipment. Predictive maintenance applies models to that data to forecast when intervention will be required. The two terms are often used interchangeably, but they describe different roles in the same system.

The practical limitation of CBM is that it only detects failure modes that produce a measurable precursor. Many failure modes do not. Bearings, motors, and rotating equipment are good CBM candidates because their failure modes generate detectable vibration and thermal signatures. Some electronic and software failures produce no useful precursor and remain in the domain of preventive replacement or reactive repair.

RCM (Reliability-Centered Maintenance)

A structured methodology for determining the appropriate maintenance strategy for each piece of equipment based on its failure modes, the consequences of those failures, and the availability of effective maintenance tasks.

RCM was developed for the commercial airline industry in the late 1960s by United Airlines. The civilian and industrial version was formalized by F.S. Nowlan and H.F. Heap in their 1978 report for the US Department of Defense. The current standard reference is SAE JA1011, Evaluation Criteria for Reliability-Centered Maintenance Processes.

The methodology answers seven questions for each asset:

What are the functions of the asset and the associated performance standards?
In what ways can the asset fail to fulfill those functions?
What causes each functional failure?
What happens when each failure occurs?
What is the consequence of each failure (safety, environmental, operational, economic)?
What can be done to prevent or predict each failure?
What should be done if a suitable proactive task cannot be found?

The output is a maintenance strategy assigned to each failure mode: predictive where condition indicators exist, preventive where the failure pattern is genuinely time- or cycle-dependent, run-to-failure where the consequences are tolerable, or a redesign requirement where no acceptable strategy exists.

RCM emerged because empirical studies of airline equipment found that the bathtub failure pattern, which had been the implicit basis for time-based PM, applied to only a small fraction of components. Time-based maintenance on equipment that fails randomly does not improve reliability and frequently introduces failures from the maintenance activity itself.

TPM (Total Productive Maintenance)

A maintenance and operations philosophy developed by Seiichi Nakajima at the Japan Institute of Plant Maintenance (JIPM) in the 1970s and 1980s. TPM frames equipment care as a shared responsibility across operations, maintenance, and engineering, rather than as an activity owned only by the maintenance department.

The standard JIPM framework defines eight pillars:

Autonomous Maintenance (operators perform basic care of their own equipment)
Planned Maintenance (scheduled PM and PdM by the maintenance team)
Quality Maintenance (eliminating defects through equipment condition)
Focused Improvement (Kaizen activity targeting chronic losses)
Early Equipment Management (designing maintainability into new equipment)
Education and Training
Safety, Health, and Environment
TPM in Office (extending the discipline into administrative functions)

OEE is the central performance metric in TPM, and the Six Big Losses are its loss framework.

RCM and TPM are sometimes treated as competing methodologies. They are better understood as complementary. RCM is analytical and answers the question of which maintenance strategy fits which failure mode. TPM is cultural and addresses how the organization actually executes those strategies day to day. Most mature programs use both.

Run-to-Failure (RTF)

A deliberate maintenance strategy in which equipment is operated until it fails, at which point it is repaired or replaced. Distinct from accidental neglect, which produces the same outcome without the strategic decision.

Run-to-failure is appropriate when the cost of failure is low, the consequences are tolerable (no safety, environmental, or significant production impact), redundancy is in place to absorb the failure, or no effective preventive or predictive task is available. It is inappropriate when failure introduces safety risk, environmental release, significant production loss, or cascading damage to other equipment.

Under RCM analysis, run-to-failure is one of the four valid outcomes of the seven-question process. Recognizing it as a strategic choice rather than a maintenance failure is a useful step for organizations that historically labeled all unplanned failures as preventable.

Bathtub Curve

A reliability engineering model that describes the failure rate of equipment over its service life as having three distinct phases.

The infant mortality period covers early life and shows a high but declining failure rate. Failures in this period are typically caused by manufacturing defects, installation errors, or commissioning issues that surface under operating stress.

The useful life period follows. Failure rate is low and approximately constant. Failures occur randomly, driven by external events, transients, and the inherent variability of the components.

The wearout period covers end of life. Failure rate increases over time as components age, lubricants degrade, and cumulative stress takes effect.

The shape resembles the cross-section of a bathtub, with high failure rates at both ends and a flat low rate in the middle.

The bathtub curve is the implicit model behind traditional time-based preventive maintenance. PM intervals were originally chosen to intercept components before they entered the wearout phase. The reason RCM was developed is that empirical studies in the airline industry found that only about 4% of components actually exhibit a bathtub failure pattern. The remaining 96% show patterns dominated by random failure or by infant mortality without a wearout phase. For equipment that fails randomly, calendar-based PM does not improve reliability and often introduces new failures through intervention.

Work Order

A documented task assigned to a technician for maintenance, repair, or inspection. CMMS systems create, assign, and track work orders digitally with full history and parts consumption records. See Work Orders →

MTBF (Mean Time Between Failures)

Average time between unplanned equipment failures. A key indicator of equipment reliability. Tracked automatically from downtime event data.

MTTR (Mean Time to Repair)

Average time to restore equipment to operation after a failure. A key indicator of maintenance responsiveness. Calculated from downtime timestamps.

MTTD (Mean Time to Detect)

The average time between when a failure occurs and when it is detected. Distinct from MTBF (which measures intervals between failures) and from MTTR (which measures repair duration after the failure has been recognized).

MTTD reflects the effectiveness of instrumentation and monitoring. A failure that the system catches automatically through fault codes or threshold breaches has a short MTTD. A failure that operators discover only when product quality drops or when they walk past the line has a long MTTD.

Long MTTD is a signal that the detection layer is weak. The remediation is in instrumentation, alerting, and condition monitoring rather than in repair process.

MTTA (Mean Time to Acknowledge)

The average time between when an alert or fault is detected and when someone acknowledges it, typically by responding to the alert or beginning investigation.

MTTA fills the gap between MTTD and MTTR in the operational chain. A failure can be detected immediately, acknowledged in seconds, and repaired in an hour, or it can be detected immediately, sit unacknowledged for thirty minutes, and then be repaired in an hour. The total downtime is very different. Each step has a separate metric because each is improved by different actions.

The full chain of maintenance time metrics is MTTD (detection lag), MTTA (acknowledgement lag), MTTR (repair duration), and MTBF (time between failures). Long MTTA usually points to staffing, escalation, or alert fatigue. Long MTTR usually points to training, parts availability, or diagnostic difficulty. Long MTTD usually points to instrumentation. The metrics are diagnostic by themselves and most useful when read together.

Lean & Continuous Improvement

Lean Manufacturing

A production methodology developed by Toyota in the post-war period and codified for Western audiences in the late 1980s and early 1990s. The central principle is the systematic elimination of waste, called muda in the original Japanese, with the goal of producing only what the customer needs, only when it is needed, and only in the quantities required.

Lean identifies seven categories of waste, traditionally called the seven muda.

Overproduction — making more than the next process needs
Waiting — idle time during production
Transport — unnecessary movement of materials
Over-processing — doing more work than the specification requires
Inventory — material held in excess of what is needed
Motion — unnecessary movement of operators
Defects — production of non-conforming output

An eighth waste, the underutilization of human talent and creativity, is commonly added in modern formulations.

Lean is supported by a set of associated practices, each with its own discipline: 5S, Kanban, Poka-Yoke, Standard Work, Heijunka, Jidoka, and Continuous Flow. These are tools within the methodology rather than the methodology itself. Implementing the tools without the underlying culture and management commitment often produces visible improvements that fade as soon as attention shifts.

The most cited Western reference is The Machine That Changed the World by James Womack, Daniel Jones, and Daniel Roos (1990), which gave the methodology its English name and brought it to wide attention outside Japan. The earlier internal Toyota publications by Taiichi Ohno, the engineer most associated with the original Toyota Production System, remain the primary sources for the underlying philosophy.

A workplace organization methodology developed in Japan and widely adopted as a foundation for lean manufacturing. The five terms are typically given in their Japanese form alongside an English translation.

Seiri (Sort) separates needed items from unneeded ones and removes what is not required for the work being done at that location.
Seiton (Set in order, or Straighten) arranges remaining items so the right tool, part, or document is at the right place at the right time.
Seiso (Shine) establishes regular cleaning of the workspace and equipment, with cleaning serving as a form of inspection.
Seiketsu (Standardize) sets standards for the first three steps so they are applied consistently across shifts, operators, and locations.
Shitsuke (Sustain) maintains the discipline through training, audits, and management commitment.

A sixth S for Safety is commonly added in Western implementations, producing 6S programs.

5S is the most visible expression of a plant's lean discipline. An operation where tools are scattered and aisles are obstructed is unlikely to execute well on more advanced lean methods. The reverse does not follow automatically. A well-organized plant is not necessarily running a mature lean operation, though it has cleared a precondition for one.

The fifth S, Sustain, is the step that distinguishes durable 5S programs from cosmetic ones. Programs that treat 5S as a one-time clean-up event tend to produce visible short-term improvement that fades as attention shifts elsewhere.

Kanban

A visual scheduling system for controlling inventory and production flow. In manufacturing software, Kanban signals trigger replenishment of parts or materials when inventory reaches a set threshold.

Poka-Yoke

Error-proofing — designing processes or checks so that defects are physically impossible or immediately detected. Scan-to-confirm and required quality check completion are common forms of Poka-Yoke in digital quality systems.

Kaizen

Japanese term for "continuous improvement." A philosophy of making small, incremental improvements to processes on an ongoing basis, rather than large infrequent overhauls.

Pareto Analysis

A diagnostic technique based on the Pareto principle, the empirical observation that in many systems a small number of causes account for the majority of effects. The principle is named for the Italian economist Vilfredo Pareto and was introduced into quality management practice by Joseph Juran, who phrased it as the distinction between "the vital few and the trivial many."

A Pareto chart is a bar chart with categories ordered from largest to smallest contributor, often paired with a cumulative percentage line. The chart visually identifies the small number of causes responsible for most of the impact and separates them from the many causes that contribute marginally.

Common applications in manufacturing include downtime by reason code (which stoppages cost the most production time), scrap by reason code (which defects drive the most material loss), quality issues by part or operation (where defects concentrate), and customer complaints by category (which issues generate the most complaints).

Two pitfalls are worth flagging. Pareto is a prioritization tool rather than an explanation. A reason code dominating the chart shows where to investigate, not why the problem is occurring. The follow-up work, typically a 5 Whys or a fishbone diagram, is what produces a useful answer.

The second pitfall is the choice of axis. A Pareto chart by event count and a Pareto chart by impact (duration, dollar value, scrap weight) frequently produce different rankings. Five short stops totaling ten minutes will rank above one eight-hour failure by event count and well below it by duration. The right axis depends on what the analysis is being used to prioritize.

Systems & Protocols

MES (Manufacturing Execution System): Software that connects to production equipment and personnel to collect real-time data, track production against schedules, and report on OEE, downtime, and quality. See 10in6 MES →
DQS (Digital Quality System): Software for digitizing quality check workflows, SPC, visual inspection, track & trace, and audit-ready records. Replaces paper-based quality check sheets. See 10in6 DQS →
PLC (Programmable Logic Controller): An industrial computer that controls machine automation. Connects to MES platforms via OPC UA, Modbus, EtherNet/IP, and other industrial protocols. Common manufacturers include Allen-Bradley, Siemens, Mitsubishi, Omron, and Beckhoff. See PLC Connectivity →
SCADA (Supervisory Control and Data Acquisition): A system for monitoring and controlling industrial processes, typically at a plant-wide or line level. SCADA and MES occupy adjacent layers in the ISA-95 reference architecture and are sometimes integrated.
OPC UA (OPC Unified Architecture): The modern, platform-agnostic standard for industrial equipment communication. Preferred protocol for connecting to modern PLCs and controllers.
ERP (Enterprise Resource Planning): Business management software covering finance, HR, supply chain, and manufacturing planning. Integrates with MES platforms to exchange production schedules, work orders, and as-built quantities. Examples include SAP, Oracle, QAD, Epicor, and Sage. See ERP Integration →
IIoT (Industrial Internet of Things): The network of connected industrial devices, sensors, and machines that share data. The umbrella term covering modern shop-floor connectivity, edge computing, and the data layer underneath production reporting and analytics.

Compliance & Customs

DRP (Duty Relief Program): A Canada Border Services Agency program that can allow eligible importers to bring goods into Canada without paying duties when those goods are later exported, either in the same condition or after being used in further processing. Manufacturers operating under DRP need traceable records connecting customs transactions to received material, production, scrap, inventory, and exports. The official program description is on the CBSA Duty Relief Program page. See DRP Traceability →
CBSA (Canada Border Services Agency): The federal agency responsible for enforcing customs, immigration, and border-related legislation in Canada. CBSA administers the Duty Relief Program and is the authority that defines the records and reporting required to support it.
CAD (Customs Accounting Document) / B3: The document used to account for imported goods entering Canada — sometimes referred to as the B3 form. Each CAD has a transaction number and one or more line numbers identifying the imported goods. The CAD transaction number is the key reference that needs to be linked back to physical received material under the Duty Relief Program.
PARS (Pre-Arrival Review System): A CBSA program that allows customs brokers to submit release information before a shipment arrives at the border, so customs clearance can be processed in advance. The PARS number is used by carriers and brokers during border clearance and links to the eventual CAD transaction.
Diversion (under DRP): Imported material that does not end up exported as finished goods — including process scrap, flash, shavings, rejected parts, domestic sale, destruction, missing inventory, or material that exceeded the required DRP timeline. Diverted material may be subject to duty and GST owing and must be reported.
Duty Drawback: A separate CBSA program from the Duty Relief Program. Under Drawback, the importer pays duties on entry and then claims a refund after the goods are exported, used in producing exported goods, or destroyed. DRP defers duties at the import stage; Drawback refunds duties after export. Some manufacturers operate under both depending on the goods, the timing, and how their broker structures the filings.
HS Classification (Harmonized System): The international standard for classifying traded products. HS codes determine the tariff treatment, duty rate, and reporting requirements for imported goods. Each CAD line carries an HS classification.
VFD / VFCC (Value for Duty / Value for Currency Conversion): The declared value of imported goods used to calculate duties and taxes. VFCC is the value in the supplier's currency; VFD is the converted Canadian-dollar value used by CBSA for duty calculation.