1030 DATA QUALITY*
1030 A. Introduction
An analytical laboratory’s role is to produce measurement-
based data that is technically valid, legally defensible, and of
known quality. All measurements contain error, which may be
systematic (unvarying magnitude) or random (varying magni-
tude and equal probability of being positive or negative). A
method’s analytical performance is defined by its unique com-
bination of systematic and random errors.
1
Quality assurance is
a program designed to make the measurement process as reliable
as possible. Quality control (QC) procedures are activities de-
signed to identify and determine sources of error.
1.
Measures of Quality Control
Two routine indicators of measurement quality that analysts
use to assess a method’s validity are precision (random error)
and bias (systematic error). Precision indicates how closely
repeated measurements agree. A measurement’s precision is
acceptable if its random errors are low. Accuracy indicates how
close a measurement is to the true value. A measurement is
acceptably accurate when both systematic and random errors are
low. QC results outside acceptance limits (which are set by data
quality objectives) are evidence that a method may be out of
control due to determinant errors (e.g., contaminated reagents or
degraded standards).
2.
Measurement Error and Data Use
Both random and systematic measurement errors make labo-
ratory data less reliable. As a measured value decreases, its
relative error (e.g., relative standard deviation) may increase,
making its validity more uncertain. Reporting tools (e.g., detec-
tion or quantitation limits) frequently are used to establish a
lower concentration limit for reporting data that incorporate
statistical uncertainty.
Laboratory data may be used for such purposes as regulatory
monitoring, environmental decision-making, and process con-
trol. The procedures used to extract information for different
purposes vary and may be diametrically opposed. For example,
a regulatory monitoring measurement that is below detection
level may be appropriately qualified because the error bar is
relatively large and may preclude a statistically sound decision.
However, data collected over time may be treated by statistical
methods to provide a statistically sound decision even if many of
the values are below nominal detection levels.
2
3.
The Analyst’s Responsibility
The analyst must understand the QC measures and how to
apply them to the data quality objectives (DQOs) of process
control, regulatory monitoring, and environmental field studies.
It is important that DQOs be clearly defined and detailed before
sample analysis begins so the data will be technically correct and
legally defensible.
4. References
1. YOUDEN, W.J. 1987. Statistical Manual of the Association of Official
Analytical Chemists. Assoc. Official Analytical Chemists Interna-
tional, Arlington, Va.
2. OSBORN, K.E. 1995. You can’t compute with less thans. Water
Environment Laboratory Solutions. Water Environment Federation,
Alexandria, Va.
1030 B. Measurement Uncertainty
1.
Introduction
Even when obtained with the greatest possible care, every
measurement has errors that ultimately are unknown and un-
knowable. These errors collectively result in what is called
measurement uncertainty. Reporting uncertainty with each mea-
surement—to the extent that it is identified and estimated—is
good practice and may spare users from making unwarranted or
risky decisions based on the measurement alone.
Measurement error (E) is the actual, unknown deviation of the
measurement (M) from the unknown true value (T). Measure-
ment uncertainty (U) is the state of knowledge about this un-
known deviation. Umay be defined as an uncertainty expres-
sion.
1,2
This section concerns the definition of U, how to com-
pute it, a recommendation for reporting it, the interpretation and
scope of it, and other ways of expressing it.
2.
Error
A measurement can be related to the unknown true value and
unknown measurement error as follows:
MTE
* Reviewed by Standard Methods Committee, 2011.
Joint Task Group: 20th Edition—Kenneth E. Osborn (chair), Paul W. Britton,
Robert D. Gibbons, James M. Gindelberger, Nancy E. Grams, Lawrence H. Keith,
Ann E. Rosecrance, Robert K. Wyeth.
1
This is a simple additive relationship. Other plausible relation-
ships between Mand E(e.g., multiplicative or arbitrary func-
tional relationships) are not discussed here.
Because Eis unknown, Mmust be regarded as an uncertain
measurement. Sometimes, a true value may be treated as known
(e.g., T* may be a published reference value, a traceable value,
or a consensus value) for convenience or because the method
that produced T* has less bias or variation than the one that
produced M. For example, based on the average of many mea-
surements, a vessel might be thought to contain T*50
g/L of
salt in water. It then may be sampled and routinely measured,
resulting in a reported concentration of M51
g/L. The actual
concentration may be T49.9
g/L, resulting in E51 49.9
1.1
g/L.
To generalize the nature of uncertainty, Emay be negligible or
large in absolute terms (i.e., the original units) or relative terms
(i.e., unitless, ET, or T*). The acceptability of an absolute
error’s magnitude depends on its intended use. For example, an
absolute error of 1.1
g/L may be inconsequential for an appli-
cation in which any concentration over 30
g/L is sufficient.
However, as a precision-measurement standard (e.g., for phar-
maceutical ingredients), an absolute error of 1.1
g/L could be
too large.
3.
Uncertainty
The reported measurement uncertainty will contain the actual
measurement error with a stated level of confidence. For example,
if MUis presented as a 95% confidence interval, then approx-
imately 95% of the time, Ewill fall within the range of U.
4.
Bias
Bias (systematic error) is the signed (or –) deviation be-
tween the average measured value and the true value as the
number of averaged measurements tends toward infinity and the
related uncertainty tends toward zero. For example, the reason a
49.9-
g/L salt solution (T) is thought to be 50
g/L (T*) could
be a bias (B0.1
g/L). The “leftover” error (1.1 0.1 1.0
g/L) is the random component (stochastic error) that changes
with each measurement.
The bias is fixed and may be related to the method used to
produce T*. Usually, a recognized method will be used to
produce or certify a traceable standard—a sample with a cer-
tificate stating the accepted true value (T*). This method may be
either the best or most widely accepted method available, chosen
because of its minimal bias and stochastic error. Such a traceable
standard may be purchased from a standards organization [e.g.,
National Institute of Standards and Technology (NIST)].
5.
Bias and Random Variation
Both Eand Ucan be split into two components:
EZB
where:
Zrandom error, and
Bsystematic error.
Random error (Z) is the component that changes from one
measurement to the next under certain conditions. It is assumed
to be independent and to have a distribution—typically Gaussian
(normal distribution). The normal distribution of Zis character-
ized by a mean (
) of zero (because any non-zero component is
part of bias, by definition) and the traditional standard deviation
(
E
). In other words, about 95% of Zwill lie within the interval
2
E
. So if there is no bias and Eis independent and
normally distributed, then M2
E
would be a suitable way to
report a measurement and its uncertainty. (Normal probability
tables and statistical software give the proportions of the normal
distribution and thus the percent confidence gained that an ob-
servation is contained within k
E
for any value of scalar k.)
However,
E
usually is unknown and must be estimated by the
sample standard deviation (s
E
), which is based on multiple
observations and statistical estimation. In this case, scalar kis not
chosen based on the normal distribution but rather on the Stu-
dent’s tdistribution, taking into account the number of degrees
of freedom associated with s
E
.
Systematic error (B) is the nonrandom component; it typically
is equated with bias and can include outright mistakes (analyst
blunders) and lack of control (drifts, fluctuations, etc.).
3
In this
manual, the terms systematic error and bias are intended to be
used interchangeably.
Boften is more difficult to estimate and make useful than Zis.
Knowledge about bias is likely to be hard to obtain; once
obtained, it is likely to be exploited to make the measurement
less biased or repeated (an appropriate response). If bias is
known exactly (or nearly so), the user can subtract it from Mto
reduce total measurement error.
If bias is unknown (i.e., could be one of a wide but unknown
distribution of plausible values), users may adopt a worst-case
approach and report an extreme value, re-test, or simply ignore
bias altogether. For example, historical data may indicate that
interlaboratory biases are significant or that QC measurements of
standards shift every time a measurement system is cleaned.
Without traceable standards, it is hard for laboratory personnel to
do anything other than be ignorant of the potential problem.
The recommended practice for many methods is to conduct
routine QA/QC measurements on a suite of internal standards.
Plot measurements on control charts, and when an out-of-control
condition occurs, recalibrate the system with traceable standards.
This permits the laboratory to publish a boundary on bias—
assuming that the measurement system’s underlying behavior is
somewhat predictable and that changes between QA/QC sam-
pling are acceptably small (e.g., slow drifts and small shifts).
Many analytical methods are not amenable to use of internal
standards in each sample, and external standards and calibration
standards must be relied on for an entire set of samples in an
analytical run.
6.
Repeatability, Reproducibility, and Sources of Bias and
Variation
a. Sources and measurement: The sources of bias and vari-
ability include sampling error; sample preparation; interference
by matrix or other measurement quantities/qualities; variations
in calibration error; software errors; counting statistics; an ana-
lyst’s deviations from the method; instrument differences (e.g.,
chamber volume, voltage level); environmental changes (tem-
DATA QUALITY (1030)/Measurement Uncertainty
2
DATA QUALITY (1030)/Measurement Uncertainty
perature, humidity, ambient light, etc.); contamination of sample
or equipment (e.g., carryover and ambient contamination); vari-
ations in purity of solvent, reagent, catalyst, etc.; stability and
age of sample, analyte, or matrix; and warm-up or cool-down
effects (tendency to drift over time). The simplest strategy for
estimating bias is to measure a traceable (known) standard and
then compute the difference between Mand T*:
MT*BZ
The uncertainty in this case is assumed to be small, although
in practice there may be situations in which this assumption is
inappropriate. If random uncertainty (Z) is negligible (i.e., Z
0), then MT* will provide an estimate of bias (B). If Zis not
negligible, it can be observed and quantified by repeatedly mea-
suring the same test specimen (if the measurement process is not
destructive). This may be part of a QA/QC procedure.
b. Repeatability: Repeatability (also called intrinsic measure-
ment variability) is the smallest amount of variation that remains
in a measurement system when repeatedly measuring the same
specimen while preventing controllable sources of variability
from affecting results. It is quantified by the repeatability stan-
dard deviation (
RPT
), which can be obtained by pooling sample
standard deviations of measurements of Jspecimens:
RPT
1
J
i1
J
RPT,i
2
Repeatability is considered an approximate lower boundary to
the standard deviation experienced in practice. The repeatability
standard deviation sometimes is used to compute uncertainty
intervals (U) (referred to as ultimate instrument variability)
based on the Student’s tdistribution (U⫽⫾ks
RPT
).
Common sense and experience demonstrate that repeatability
is an overly optimistic estimate of uncertainty for routine mea-
surements, which are subject to many sources of bias and vari-
ability that are intentionally eliminated or restrained during a
repeatability study. The uncertainty in both Band Zare greater
in routine measurements.
c. Reproducibility: Reproducibility is the variation in a mea-
surement system that occurs when repeatedly measuring a sam-
ple while allowing (or requiring) selected sources of Bor Zto
affect results. It is quantified by the reproducibility standard
deviation (
RPD
), accompanied by a list of known applicable
sources of B and Z, and notes on which sources varied.
Barring statistical variation (i.e., variation in estimates of
variability, such as the noisiness in sample standard deviations),
RPD
is always greater than
RPT
because it has more compo-
nents. Typically, one or more of the following varies in a
reproducibility study: instrument, analyst, laboratory, or day.
Preferably, design a study tailored to the particular measurement
system (see 1030B.7). If the sample varies, compute
RPD
sep-
arately for each sample, then pool the homogeneous results.
Treat factors that vary as random factors and assume they are
independent normal random variables with a mean of zero.
However, this assumption can often be challenged if the sample
and possibly the target populations are small (even identical);
there may be a question of “representativeness.” Suppose, for
example, that out of 20 laboratories (or analysts or instruments)
that can do tandem mass spectrometry for a particular analyte
and matrix, only six report usable measurements. It is hard to
know how representative these six are— especially after a post-
study ranking and exclusion process—and whether the Bsofthe
20 are normally distributed (probably not discernible from six
measurements, even if they are representative).
It may be more appropriate to treat each factor with few
known values (e.g., laboratories) as fixed factors, which have
fixed effects. In other words, each laboratory, analyst, instru-
ment, or day has a different bias, but its distribution is assumed
to be unknown (or unknowable), so a small sample cannot be
used to estimate distribution parameters, particularly standard
deviation. For example, assuming that variables are random, are
normal, and have a mean of zero may be inappropriate in an
interlaboratory round-robin study. Every laboratory has some B,
but it is difficult to characterize because of laboratory anonymity,
the small number of laboratories contributing usable data, etc.
Because of these concerns about assumptions and the potential
ambiguity of its definition, do not report reproducibility unless it
is accompanied with the study design, a list of known sources of
Band Z, and notes on which sources varied.
7.
Gage Repeatability and Reproducibility, and the
Measurement Capability Study
The Gage repeatability and reproducibility (Gage R&R) ap-
proach combines repeatability and reproducibility.
4
It treats all
factors (including B) as random and is based on the simplest
nontrivial model:
ZZ
RPT
Z
L
where:
Z
RPT
normally distributed random variable with mean equal to
zero and variance equal to
RPT
2
, and
Z
L
normally distributed random variable with mean equal to
zero and with the variance of the factor (e.g., interlabora-
tory) biases,
L
2
.
The overall measurement variation then is quantified by
E
RPD
RPT
2
L
2
Estimates for
RPT
and
RPD
usually are obtained by conduct-
ing a nested designed study and analyzing the components of the
results’ variance. This approach can be generalized to reflect
good practice in conducting experiments. The following mea-
surement capability study (MCS) procedure is recommended.
The goal is not necessarily to quantify the contribution of every
source of Band Z, but rather to study those considered important
via systematic error budgeting.
When performing an MCS to assess Uvia systematic error
budgeting, begin by identifying sources of Band Zthat affect E.
This can be done with a cause-and-effect diagram—perhaps with
source categories of equipment, analyst, method (procedure and
algorithm), material (aspects of test specimens), and environment.
Select sources to study either empirically or theoretically.
Typically, study sources that are influential, that can be varied
during the MCS, and that cannot be eliminated during routine
DATA QUALITY (1030)/Measurement Uncertainty
3
DATA QUALITY (1030)/Measurement Uncertainty
measurement. Select models for the sources. Treat sources of B
as fixed factors, and sources of Zas random factors.
Design and conduct the study, allowing (or requiring) the
selected sources to contribute to measurement error. Analyze the
data graphically and statistically [e.g., by regression analysis,
analysis of variance (ANOVA), or variance components analy-
sis]. Identify and possibly eliminate outliers (observations with
responses that are far out of line with the general pattern of the
data), and leverage points (observations that exert high, perhaps
undue influence).
Refine the models, if necessary (e.g., based on residual anal-
ysis), and draw inferences for future measurements. For random
effects, this probably will be a confidence interval; for fixed
effects, a table of estimated Bs.
8.
Other Assessments of Measurement Uncertainty
The following procedures for assessing measurement uncer-
tainty are discussed below in order of increasing empiricism.
a. Exact theoretical: Some measurement methods are closely
tied to exact first-principles models of physics or chemistry. For
example, measurement systems that count or track the position
and velocity of atomic particles can have exact formulas for
uncertainty based on the particles’ known theoretical behavior.
b. Delta method (law of propagation of uncertainty): If a result
can be expressed as a function of input variables with known
error distributions, then sometimes the distribution of such re-
sults can be computed exactly.
c. Linearized: The delta method’s mathematics may be diffi-
cult, so a linearized form of MTEmay be used instead. It
involves a first-order Taylor series expansion about key variables
that influence E:
M
MT
M/
G
1
M/
G
2
M/
G
3
...
for sources G
1
,G
2
,G
3
, etc. of Band Zthat are continuous
variables (or can be represented by continuous variables). The
distribution of this expression may be simpler to determine
because it involves the linear combination of scalar multiples of
the random variables.
d. Simulation: The delta method also is used to conduct
computer simulations. If the distributions of Es in input variables
are known or can be approximated, then a computer simulation
(e.g., Monte Carlo) can empirically obtain the distribution of Es
in the result. It typically generates 1 to 10 000 sets of random
deviates (each set has one random deviate per variable) and
computes and archives M. The archived distribution is an em-
pirical characterization of Uin M.
e. Sensitivity study (designed experiment): If the identities and
distributions of Band Zsources are known, and the sources are
continuous factors but the functional relationship between them
and Mis unknown, then analysts can conduct an empirical
sensitivity study (i.e., MCS) to estimate the low-order coeffi-
cients (
M/
G) for any factor G. This will produce a Taylor
series approximation of
M, which can be used to estimate the
distribution of
M,asicabove.
f. Random effects study: This is the nested MCS and variance
components analysis described in 1030B.7.
g. Passive empirical (QA/QC data): An even more empirical
and passive approach is to rely solely on QA/QC or similar data.
The estimated standard deviation of sample measurements taken
over many days by different analysts using different equipment
(perhaps in different laboratories) can provide a useful indication
of U.
9.
Uncertainty Statements
Ideally, measurements should be reported with an uncertainty
statement (and its basis). Develop uncertainty statements as
follows.
4–8
With the help of data users, experts on the measurement
system’s principles and use, and experts on sampling contexts,
generate a cause-and-effect diagram for Ethat identifies and
prioritizes sources of Band Z(factors). Consult literature quan-
tifying Band Z. If needed, conduct one or more MCSs—
incorporating the sources considered most important—to pro-
vide “snapshot” estimates of Band Z(sometimes Gage R&R
studies may be sufficient).
Institute a QA/QC program in which analysts routinely mea-
sure traceable or internal standards and plot the results on Xand
Rcontrol charts (or equivalent charts). React to out-of-control
signals on these charts (e.g., recalibrate using traceable standards
when the mean control chart shows a statistically significant
change). Use the control charts, relevant literature, and the MCSs
to develop uncertainty statements that involve both Band Z.
10. References
1. YOUDEN, W.J. 1972. Enduring values. Technometrics 14:1.
2. HENRION,M.&B.FISCHHOFF. 1986. Assessing uncertainty in physical
constant. Amer. J. Phys. 54:791.
3. CURRIE, L. 1995. Nomenclature in evaluation of analytical methods
including detection and quantification capabilities. Pure Appl. Chem.
67:1699.
4. MANDEL, J. 1991. Evaluation and Control of Measurements. Marcel
Dekker, New York, N.Y.
5. NATIONAL INSTITUTE of STANDARDS and TECHNOLOGY. 1994. Guidelines
for Evaluating and Expressing the Uncertainty of NIST Measurement
Results, Technical Note 1297. National Inst. Standards & Technol-
ogy, Gaithersburg, Md.
6. INTERNATIONAL STANDARDS ORGANIZATION. 2008. ISO/IEC Guide 98-
3.2008, Part 3: Guide to the Expression of Uncertainty in Measure-
ment. International Standards Org., Geneva, Switzerland.
7. KOCHERLAKOTA, N., R. OBENAUF &R.THOMAS. 2002. A statistical
approach to reporting uncertainty on certified values of chemical
reference materials for trace metal analysis. Spectroscopy 17(9):20.
8. GUEDENS, W.J., J. YPERMAN,J.MULLENS, E.J. PAUWELS &L.C.VAN
PUCKE. 1993 Statistical analysis of errors: a practical approach for an
undergraduate chemistry lab—Part 1. J. Chem. Ed. 70(9):776; Part 2,
70(10):838.
DATA QUALITY (1030)/Measurement Uncertainty
4
DATA QUALITY (1030)/Measurement Uncertainty
1030 C. Method Detection Level
1.
Introduction
Detection levels are controversial, principally because terms
are inadequately defined and often confused. For example, the
terms instrument detection level (IDL) and method detection
level (MDL) are often incorrectly used interchangeably. That
said, most analysts agree that a detection level is the smallest
amount of a substance that can be detected above the noise in a
procedure and within a stated confidence level. Confidence lev-
els are set so the probabilities of both Type I errors (false
detection) and Type II errors (false nondetection) are acceptably
small. Use of the term “detection limit” has been avoided herein
to prevent confusion with regulatory usage of the term.
Currently, there are several types of detection levels—IDL,
MDL, lower level of detection (LLD), and level of quantitation
(LOQ)— each with a defined purpose (Section 1010C). The
relationship among them is approximately IDL:LLD:MDL:LOQ
1:2:4:10. (Occasionally, analysts use the IDL as a guide for
determining the MDL.)
2.
Determining Detection Levels
An operating analytical instrument usually produces a sig-
nal even when no sample is present (e.g., electronic noise) or
when a blank is being analyzed (e.g., molecular noise). Be-
cause any QA program requires frequent analysis of blanks,
the mean and standard deviation of this background signal
become well known; the blank signal can become very precise
(i.e., the Gaussian curve of the blank distribution becomes
very narrow). The instrument detection level is the constituent
concentration that produces a signal greater than three stan-
dard deviations of the mean noise level or that can be deter-
mined by injecting a standard into the instrument to produce
a signal that is five times the signal-to-noise ratio. The IDL is
useful for estimating the constituent concentration (amount)
in an extract needed to produce a signal to permit calculating
an estimated MDL.
The lower level of detection is the amount of constituent
that produces a detectable signal in 99% of trials. Determine
the LLD by analyzing multiple samples of a standard at
near-zero concentrations (no more than five times the IDL).
Determine the standard deviation (s) by the usual method. To
reduce the probability of a Type I error to 5%, multiply sby
1.645 (from a cumulative normal probability table). To also
reduce the probability of a Type II error to 5%, multiply sby
3.290 instead. For example, if 20 determinations of a low-
level standard yield an sof 6
g/L, then the LLD is 3.29
620
g/L.
1
The MDL differs from the LLD in that samples containing the
constituent of interest are processed through the complete ana-
lytical method. MDLs are larger than LLDs because of extrac-
tion efficiency and extract concentration factors. The procedure
for determining MDLs is outlined in Section 1020B.4.
Although LOQ is useful in a laboratory, the practical quan-
titation limit (PQL) has been proposed as the lowest level
achievable among laboratories within specified limits during
routine laboratory operations.
2
The PQL is significant because
different laboratories will produce different MDLs even when
using the same analytical procedures, instruments, and sample
matrices. The PQL, which is about three to five times larger than
the MDL, is a practical and routinely achievable detection level
with a relatively good certainty that any reported value is reli-
able.
Numerous other definitions of detection and quantitation lev-
els have recently been evaluated and are still under discussion.
3,4
In addition, a few terms are in use as de facto specific reporting
levels (RLs). These include MDL, PQL, minimum quantifiable
level (MQL), and minimum reporting level (MRL). These may
be in use in various sections of Standard Methods and are
included in the glossary in Section 1010C.
3.
Description of Levels
Figure 1030:1 illustrates the detection levels discussed above.
For this figure, it is assumed that the signals from an analytical
instrument are distributed normally and can be represented by a
normal (Gaussian) curve.
5
The curve labeled B is representative
of the background or blank signal distribution. As shown, the
distribution of blank signals is nearly as broad as for the other
distributions (i.e.,
B
I
L
). As blank analyses continue,
this curve will become narrower because of increased degrees of
freedom.
The curve labeled I represents the IDL. Its average value is
located k
B
units distant from the blank curve, and krepresents
the value of t(from the one-sided tdistribution) that corresponds
to the confidence level chosen to describe instrument perfor-
mance. For a 95% level and n14, k1.782; for a 99% limit,
k2.68. The overlap of the B and I curves indicates the
probability of a Type II error.
The curve labeled L represents the LLD. Because only a finite
number of determinations is used to calculate IDL and LLD, the
curves are similar to the blank, only broader, so it is reasonable
to choose
I
L
. Therefore, LLD is k
I
k
L
2
L
from the
blank curve.
Figure 1030:1. Detection level relationship.
DATA QUALITY (1030)/Method Detection Level
5
DATA QUALITY (1030)/Method Detection Level
1 / 9 100%
La catégorie de ce document est-elle correcte?
Merci pour votre participation!

Faire une suggestion

Avez-vous trouvé des erreurs dans linterface ou les textes ? Ou savez-vous comment améliorer linterface utilisateur de StudyLib ? Nhésitez pas à envoyer vos suggestions. Cest très important pour nous !