Gauge Repeatability and Reproducibility (GR&R): What It Is and How to Run One

Before you trust your measurement data, you need to know whether the measurement system is measuring the process or measuring itself. A GR&R study quantifies how much of the variation in your data comes from the gauge and the operators who use it.

Before you trust your measurement data, you need to know whether your measurement system is actually measuring the process or just measuring itself. A gauge that consumes half of your total variation budget is not a useful gauge, regardless of how frequently it is calibrated. Calibration confirms that the gauge reads accurately at specific reference points. It does not tell you how much variation the gauge introduces when an operator uses it repeatedly on production parts.

A GR&R study quantifies how much of the observed variation in your data comes from the gauge and the operators who use it, versus actual part-to-part variation from the process. If that number is too large, the process improvements you are tracking may not be real. You may be tracking measurement noise, not process signal. The AIAG Measurement System Analysis (MSA) manual and ISO 22514-7 both provide the statistical framework for conducting and interpreting GR&R studies.

What repeatability and reproducibility measure

Repeatability is the variation you observe when the same operator measures the same part multiple times with the same gauge under the same conditions. Every measurement is made under conditions as identical as possible: same operator, same part, same gauge, same environmental conditions, same measurement procedure. The variation that remains is the gauge's inherent variation, also called equipment variation (EV). A gauge with poor repeatability produces different readings every time it measures the same thing, even in the hands of the same skilled operator.

Reproducibility is the variation between different operators measuring the same part with the same gauge. It is the operator-to-operator inconsistency in the measurement system. Two operators measuring the same part with the same gauge should, in principle, get the same result. When they do not, the difference is reproducibility variation, also called appraiser variation (AV). Reproducibility problems usually point to inconsistencies in how operators set up the measurement, how they position the part, how much force they apply, or how they read the gauge display.

Total GR&R is the combined effect of repeatability and reproducibility. It represents the total contribution of the measurement system to observed variation. The GR&R result is reported as a percentage: the ratio of measurement system variation to either total observed variation or to the specification tolerance width. A low percentage means the measurement system is contributing a small share of the observed variation. A high percentage means the measurement system is consuming a large share, leaving less of the variation budget available to see the actual process.

The number of distinct categories (ndc) is a companion metric that tells you how many statistically distinct groups the measurement system can detect within the process variation. If ndc is 2, the gauge can only separate parts into two groups: low and high. It cannot rank parts or detect gradual trends. A process control chart drawn with data from a gauge with ndc below 5 will not detect meaningful process shifts until they are large relative to specification, by which point significant nonconforming product may have already been produced.

When to run a GR&R study

Run a GR&R before putting a new gauge into production use for process control or product acceptance decisions. If the measurement system is not adequate, the decisions made with its data are not reliable. Finding this out before the gauge is in production use is far less costly than discovering it after months of process control data has been collected.

Run a GR&R before and after any significant equipment change or gauge replacement. A replacement gauge of the same model is not automatically equivalent to the original for measurement system purposes. Different unit variation, different fixture fit, and different display response characteristics can all affect GR&R results. Qualification of the replacement gauge requires its own study.

Run a GR&R before a process improvement study. If you are planning a designed experiment or a statistical process improvement project, the measurement system must be capable of detecting the effect sizes you are trying to see. A GR&R above 30 percent on the dimension you are trying to improve means that more than 30 percent of your observed variation is from the measurement system. A process improvement that reduces process variation by 10 percent will be invisible in the data. The measurement system noise will dominate.

Run a GR&R when a process that was previously in statistical control shows unexplained variation. Before assuming the process has changed, check whether the measurement system has changed. A gauge that was dropped, a fixture that has worn, an operator who changed technique, or a calibration interval that has elapsed can all cause sudden increases in measured variation that look like process instability but originate in the measurement system.

In IATF 16949 automotive environments, GR&R studies are required as part of the production part approval process (PPAP) for all variable measurement systems used in product or process monitoring. ISO 13485 environments do not mandate GR&R studies by name but require that measurement equipment be suitable for its intended use, which in practice requires measurement system analysis for quantitative measurements used in product disposition.

How to set up a crossed GR&R study

The standard crossed GR&R design uses 10 parts, 2 to 3 operators, and 2 to 3 replicates per operator per part. This gives between 40 and 90 individual measurements. The design is called "crossed" because every operator measures every part, creating a fully balanced data structure that allows independent estimation of repeatability and reproducibility.

Part selection is the step most often done incorrectly. The 10 parts must span the expected process variation from low to high. If your process produces parts that typically range from 24.85 mm to 25.15 mm on the dimension of interest, select parts that cover that range: some near the low end, some near the middle, some near the high end. Do not select 10 parts that all measure between 25.00 and 25.05 mm. If your measurement system variation is 0.015 mm and your parts span only 0.05 mm, the GR&R percentage will be artificially inflated because the denominator (part-to-part variation) is too small. The result will show a poor GR&R even for an adequate measurement system, because you designed the study to find a problem.

Operator selection must reflect the actual production measurement population. Use the operators who perform this measurement during production, not engineers or quality personnel who measure parts occasionally. The study must represent what actually happens during production measurement. If three shifts use this gauge and the operators on each shift have different levels of experience, include operators from multiple shifts.

Randomize measurement order within each replicate. Each operator measures all 10 parts in a different random order during each replicate session. This prevents bias from learning effects, where an operator's technique improves over the course of a session, and from part conditioning effects, where repeated measurement changes the part's behavior. The random order must be generated before the study begins and strictly followed during execution.

The study must be conducted blind with respect to previous readings. Operators should not know their previous measurement of a part when they remeasure it. If an operator sees their previous reading before remeasuring, they are likely to reproduce it rather than measure independently. This suppresses repeatability variation artificially and produces an optimistic result. The standard approach is to record measurements outside the operator's line of sight or to provide measurement recording sheets that do not show previous values.

Environmental conditions during the study must match production conditions. If the gauge is used in a temperature-controlled room, conduct the study in that room. If it is used at a workstation subject to vibration from nearby equipment, conduct the study at that workstation. A GR&R conducted in controlled conditions that differ materially from production conditions does not characterize the measurement system as it is actually used.

How to interpret the results

A %GR&R below 10 percent indicates the measurement system is acceptable for most applications. The measurement system is contributing less than 10 percent of observed variation, leaving more than 90 percent attributable to actual part-to-part variation. Process control charts, capability studies, and product disposition decisions made with this data will reliably reflect the process rather than measurement noise.

A %GR&R between 10 and 30 percent is in a gray zone. The measurement system may be acceptable depending on the cost of measurement improvement, the criticality of the characteristic, and the requirements of your customers or regulatory body. An automotive customer operating under IATF 16949 may require a %GR&R below 10 percent for critical safety characteristics regardless of other considerations. A medical device manufacturer using this measurement for incoming inspection of a non-critical dimension may accept 25 percent with documented justification.

A %GR&R above 30 percent means the measurement system is not acceptable. Do not use data from this measurement system for process control or product disposition until the root cause of the excessive measurement variation is identified and corrected. Decisions made with a 35 percent GR&R result will be significantly influenced by measurement noise. A control chart run with this data will generate false alarms for process variation that does not exist, and may miss real process shifts that are masked by measurement variation.

The number of distinct categories must be 5 or greater for the gauge to be useful for process control. An ndc of 4 means the gauge can separate parts into four statistically distinct groups within the process variation. For a process control chart, this is the minimum needed to detect trends, runs, and shifts of meaningful size before the process drifts significantly out of control. Below 5, the gauge is only adequate for a pass or fail inspection against a tolerance, and even that use should be evaluated carefully.

The split between repeatability and reproducibility in the results tells you where to look for the root cause. If repeatability is the dominant source of variation, the problem is the gauge itself: its inherent precision, its mechanical condition, or the measurement principle it uses. The solution is gauge repair, replacement, or substitution of a more precise measurement method. If reproducibility is the dominant source of variation, the problem is operator technique inconsistency. The solution is standardized work instruction for the measurement procedure, operator training, and potentially fixture design changes that reduce the influence of operator technique on the measurement result.

Common GR&R failures and their causes

Operator technique inconsistency is the most common driver of high reproducibility. Different operators hold the part at different angles, apply different seating force against the fixture, or read an analog display from different viewing angles. All of these introduce systematic differences between operators that reproducibility captures. The fix is a detailed measurement work instruction that specifies exactly how the part is loaded into the fixture, what force is applied, and how the gauge is read. After the instruction is in place, conduct training and then repeat the GR&R study to confirm the improvement.

Fixture repeatability problems produce high equipment variation (repeatability) that is incorrectly attributed to the gauge. If the fixture does not hold the part in a consistent position and orientation for every measurement, the variation in part position becomes variation in the measurement reading. Distinguishing fixture variation from gauge variation requires a separate study where the gauge probe is held in a fixed position and only the fixture repeatability is varied. If the fixture is the problem, redesigning the fixture is the solution, not replacing the gauge.

Gauge resolution that is too coarse for the tolerance is a fundamental incompatibility. If the gauge can only resolve 0.1 mm increments and the tolerance on the dimension is 0.2 mm total (plus or minus 0.1 mm), the gauge can only report two values within the specification: 0.0 mm (nominal) and 0.1 mm (at the limit). It cannot detect variation between those values. The GR&R will show high %GR&R because the measurement system cannot discriminate within the variation that exists. The only solution is to replace the gauge with one that has sufficient resolution, typically at least 10:1 ratio of resolution to tolerance.

Environmental factors that are not controlled during the study but are present during production will not appear in the GR&R result if the study is conducted under controlled conditions. Temperature cycles, vibration from nearby presses or conveyors, humidity effects on sensitive weight measurements, and air currents affecting optical sensors all contribute to measurement variation in production but not in a study conducted in the metrology lab. For a measurement system used under production environmental conditions, the GR&R study must be conducted under those same conditions to characterize the full measurement system variation as it actually occurs.

For guidance on building your manufacturing measurement documentation or connecting GR&R results to a broader quality system, see the Aptibot documentation service and the guide on writing a manufacturing SOP.