Adaptive Hybrid Clock Discipline Algorithm for the Network Time Protocol

David L. Mills
ACM 1036540 M 63
IEEE 01201649 M 87

Abstract

This paper describes the analysis, implementation and performance of a
new algorithm engineered to discipline a computer clock to a source of
standard time, such as a GPS receiver or another computer synchronized
to such a source. The algorithm is intended for the Network Time
Protocol (NTP), which is in widespread use to synchronize computer
clocks in the global Internet, or with another functionally equivalent
protocol such as DTSS or PCS. It controls the computer clock time and
frequency using an adaptive-parameter, hybrid phase/frequency-lock
feedback loop. Compared with the current NTP Version 3 algorithm, the
new algorithm developed for NTP Version 4 provides improved accuracy and
reduced network overhead, especially when per-packet or per-call charges
are involved. The algorithm has been implemented in a special purpose
NTP simulator, which also includes the entire suite of NTP algorithms.
The performance has been verified using this simulator and both
synthetic data and real data from Internet time servers in Europe, Asia
and the Americas.

Keywords: computer network time synchronization, clock discipline
algorithm, oscillator error modeling, feedback control loop

1. Introduction

General purpose workstation computers are becoming faster each year,
with processor clocks now operating at 300 MHz and above. Computer
networks are becoming faster as well, with speeds of 622 Mbps available
now and 2.4 Gbps being installed. Using available technology with
existing workstations and Internet paths, it has been demonstrated that
computers can be reliably synchronized to better than a millisecond in
LANs and better than a few tens of milliseconds in most places in the
global Internet [4]. This technology includes the Network Time Protocol
(NTP), now used in an estimated total of over 100,000 servers and
clients in the global Internet. Over 230 public primary servers are
available in this network, each connected to a external source, such as
a GPS receiver or ACTS telephone modem.

Reliable network synchronization requires crafted algorithms which
minimize jitter on diverse network paths between clients and servers,
determine the best subset of redundant servers, and discipline the
computer clock in both time and frequency. NTP is designed to do this in
Unix, Windows and VMS operating systems. The NTP architecture, protocol
and algorithms have evolved over almost two decades, with the latest NTP
Version 3 designated an Internet (draft) standard.

At the heart of the NTP design are the algorithms that synchronize the
computer clock to NTP servers elsewhere in the Internet or an external
source. These include the algorithms that mitigate among multiple
servers to provide the most accurate and reliable time, together with
the algorithm that disciplines the computer clock with respect to this
time. The clock discipline algorithm, shortened to discipline in the
following, is the main topic of this paper. It has evolved from simple
beginnings to a sophisticated design which automatically adapts to
changes in operating environment without manual configuration or real-
time management functions. The algorithm has been implemented both in
the NTP software itself and, for the highest accuracy, in the operating
system kernel.

The current NTP Version 3 discipline, together with improvements already
implemented and described in [4], represent the departure point for this
paper. The new algorithm is intended for NTP Version 4, which is in
final testing, but could be used in principle with any protocol that
provides periodic time corrections. A key feature in the new design is
improved accuracy to the order of a few microseconds at the application
program interface with fast, modern workstations. The need for this
becomes clear upon observing that the time to read the computer clock
via a system call routine has been reduced from 40 ms a few years ago on
a Sun Microsystems SPARC to less than 1 ms today on an UltraSPARC.
Another valuable feature is that message intervals can be increased well
over the current maximum without significant impact on accuracy.

While not in itself the subject of this paper, a brief overview of the
NTP design will be helpful in understanding the algorithms involved. A
comprehensive description of the NTP architecture, protocol and
algorithms is in [4] and citations found there. The global NTP time
synchronization subnet is a hierarchical tree of time servers, and
clients organized in much the same way as digital telephone
synchronization networks. The primary servers at the root of the tree
are synchronized to national standards by radio or telephone modem. In
order to provide the most accurate and reliable service, secondary
servers and clients typically operate with several redundant servers
over diverse network paths.

The NTP software operates in each server and client as an independent
process. At designated intervals, a client sends a request to each in a
set of configured servers and expects a response at some later time. The
exchange results in four clock readings, one at the sending and
receiving times for each round. From these, the client calculates the
clock offset and roundtrip delay relative to each server separately and
presents these data to a set of grooming algorithms. The client also
calculates a distance metrics which is used to organize the NTP subnet
itself as a shortest-path spanning tree with root at the primary
servers. The grooming algorithms select the best samples in the data
stream from each server, exclude provably incorrect servers, and combine
the data from the survivors to produce a time correction, update in the
following, which drives the clock discipline algorithm.

This paper continues with a detailed description of the new discipline,
including its state machine and subalgorithms which adjust the clock
time and frequency and manage the various parameters. The paper ends
with a suite of experiments designed to validate the design and
calibrate its performance using both synthetic noise generators and real
data from Internet paths spanning continents and oceans.

2. The NTP Clock Discipline Algorithm

In both the current NTP Version 3 and new Version 4 designs, the
discipline corrects the computer clock time, compensates for its
intrinsic frequency error, and adjusts the various parameters
dynamically in response to measured network latency variations or jitter
and oscillator frequency stability or wander. The algorithms function as
a combination of two philosophically quite different feedback control
systems. In a phase-lock loop (PLL) design, the updates are used
directly to minimize the phase error and indirectly to minimize the
frequency error. In a frequency-lock loop (FLL) design, the updates are
used directly to minimize the frequency error and indirectly to minimize
the phase error. As shown later, a PLL usually works better when network
jitter dominates, while a FLL works better when oscillator wander
dominates.

The discipline is implemented as the feedback control system shown in
Figure 1. The variable qr represents the reference phase produced by the
update and qc the control phase produced by the variable-frequency
oscillator (VFO), which controls the computer clock. The phase detector
produces a signal Vd representing the instantaneous phase difference
between qr and qc. The clock filter functions as a tapped delay line,
with the output taken at the tap selected by the algorithm. The clock
selection, clustering and combining algorithms (not shown) combine the
data from multiple filters to produce the signal Vs. The loop filter,
with impulse response F(t), produces the signal Vc which controls the
VFO frequency wc and thus its phase qc. The Vc signal is generated by an
adjustment process which runs at intervals of one second. The
characteristic behavior of this model, which is determined by F(t) and
the various gain factors, is studied in many textbooks and summarized in
[4].

The original NTP Version 3 discipline, which is based on a conventional
PLL, has been improved to include a FLL capability. The selection of
which mode to use, FLL or PLL, is made on the basis of update interval
t. In the improved design, PLL mode is used for t less than 1024 s and
FLL mode is used from 1024 s to 4.5 h, which is the maximum possible in
this design. This improves the accuracy and stability in some modes of
operation, specifically those involving large t, but has proved
suboptimal for reasons described later in this paper. The new Version 4
discipline uses a combined approach with a true hybrid PLL/FLL design
which gives good performance with t from a few seconds to well over a
day, depending on accuracy requirements and acceptable network overhead.

2.1 Time and Frequency Discipline

In the new design, the loop filter of Figure 2 is implemented using two
subalgorithms, one based on a conventional PLL and the other on a FLL
design suggested in [2]. Both predict a time correction x as a function
of Vs. The PLL predicts a frequency adjustment yPLL as an integral of
Vst, while the FLL predicts an adjustment yFLL as an exponential average
of . The two adjustments are combined and added to the current clock
frequency y, as shown in the figure. The x and y are then used by the
clock adjust process to control the VFO frequency and close the feedback
loop shown in Figure 1.

An overview will suffice to describe how the correction and prediction
algorithms work. The details, which are given in [5], are beyond the
scope of this paper. At each update, a phase correction

(1)

is initialized, then exponentially amortized in Vc by the clock adjust
process at rate 1/Tc, where Tc is the loop time constant. This is
necessary to avoid step changes in time, as well as to implement the lag
network required for a stable PLL.

In PLL mode, y is a time integral over all past values of Vs, so

(2).

From [4], this results in a transfer function

(3),

where is the loop gain and is the corner frequency. From elementary
theory, this is the transfer function of a linear, time-invariant, PLL
which can drive both time and frequency errors to zero.

For good stability, Tc must be at least several times the total loop
delay which, because of the clock filter delay, is several times t. When
the discipline is first started, a relatively small t = 64 s is
required, in order to achieve the required discipline capture range of
500 parts-per-million (PPM). Selecting a compromise Tc = 1024 s, the PLL
response to a time step has a risetime of 53 minutes, an overshoot of 5%
and a 63% response to a frequency step of 4.25 hours. Ordinarily, t
increases substantially once the frequency has stabilized and Tc
increases in proportion.

In FLL mode, yFLL is an exponential average of past frequency changes,
as computed from Vs and t. The goal of the algorithm is to reduce Vs to
zero; so, if this has been successful in the past, previous values can
be assumed zero and the exponential average becomes simply

(4),

where w is a constant determined so that the averaging time is close to
the Allan intercept, as described later.

2.2 Hybrid FLL/PLL Combining Algorithm

The NTP Version 3 discipline selects either the PLL or FLL mode on the
basis of t. In general, PLL mode is used for t < 1024 s, which is
appropriate for network servers, and FLL mode is used at t > 1024 s,
which is appropriate for modem servers. While this results in a useful
compromise, it does not provide for the automatic selection on the basis
of prevailing network jitter and oscillator wander. It also results in a
moderate transient when switching between modes. In the NTP Version 4
design, the feedback loop is modified to operate as a true hybrid, where
FLL and PLL frequency predictions are computed separately and combined
on the basis of RMS error over previous predictions. The result is that
the PLL prediction is weighted more heavily under conditions of extreme
network jitter due, for example, to network congestion, while the FLL
prediction is weighted more heavily under conditions of extreme
oscillator wander due, for example, to large local temperature
variations.

As in the original design, Tc is the loop time constant and t the
interval since the previous update. In the new design, x is the residual
time correction and yadj is the frequency adjustment at the next update.
First, the xFLL and xPLL prediction errors are computed from these
values and the frequency predictions at the last update:

(5).

The RMS prediction errors and are computed from successive samples of
xFLL and xPLL. The number of samples averaged, or equivalently the
averaging time, depends on Tc, which itself depends on t. The intent of
this design is to keep the averaging time near the Allan intercept.

Next, new values for x, yPLL and yFLL are determined using (1), (2) and
(4). The prediction errors determine the proportional gain for the PLL
and FLL frequency adjustments used to compute yadj:

(6).

Figure 3 shows typical standard error (RMS) curves as a function of t in
FLL mode (solid), PLL mode (dash-dot) and hybrid mode (dash). For this
particular network path, the optimum choice is PLL mode below 200 s and
FLL mode otherwise; however, the optimum choice varies widely, depending
on the particular network path involved and degree of congestion. As a
compromise, the original NTP Version 3 discipline used PLL mode below
1024 s and FLL mode otherwise. In the new hybrid mode, PLL predictions
are usually better at small t, while FLL predictions are better at large
t. From about 100 s to 1000 s, hybrid mode is better than either PLL or
FLL modes and above 1000 s is only a little worse than FLL mode. Below
100 s hybrid mode is somewhat worse than PLL mode and a little better
than FLL mode, but still better than 1 ms.

2.3 Poll-Adjust Algorithm

NTP Version 3 time servers and clients operate today using network paths
that span the globe. In very many cases, primary servers operate with
several hundred clients or more. It is necessary to explore every means
with which t can be increased without significantly degrading clock
accuracy or stability. The NTP Version 4 discipline allows a significant
increase in t without compromising accuracy, while at the same time
adapting dynamically to widely varying network jitter and oscillator
wander regimes.

The experiments described later in this paper show that, in almost all
cases, the standard error increases as t increases, due primarily to the
effect of oscillator wander. Since the overhead decreases as t
increases, a method is needed to select t as the best compromise between
required accuracy and acceptable overhead. This is most important in
configurations where a toll charge is incurred for each poll, as in ISDN
and telephone modem services.

In the NTP Version 3 design, the minimum and maximum t default to values
appropriate for almost all network and computer configurations. For
network servers, t can range from 64 s to 1024 s, while for telephone
modem servers, it can range from 1024 s to 16,384 s. However, in NTP
Version 4, the upper limit can range up to 131,072 s, or well over a
day. The discipline automatically manages t within these ranges in
response to the prevailing network jitter and oscillator wander. An
important point is that, using the new discipline, it is not necessary
to clamp t to the minimum when switching among different synchronization
sources as in NTP Version 3. In cases of moderate to severe network
jitter and with multiple sources, this sometimes causes frequent
clockhopping which in turn degrades accuracy.

A key statistic in controlling t is the system dispersion, which is a
determined from the RMS errors measured by the grooming algorithms [5].
If is greater then the system dispersion by an experimentally determined
constant factor, the oscillator frequency is deviating too fast for the
discipline to follow. In this case, t is reduced in stages to the
minimum. If the opposite case holds for some number of updates, t is
slowly increased in steps to the maximum. A hysteresis mechanism built
into the algorithm prevents unnecessary dithering of t. Under typical
operating conditions, t hovers close to the maximum; but, on occasions
when the oscillator frequency wanders more than about 1 PPM, it quickly
drops to lower values until the wander subsides.

Figure 4 shows the time and frequency offsets and update interval for a
typical workstation over a 30-day period. In this figure, the baseline
is +10 for the t curve, zero for the time curve, and -20 for the
frequency curve. Most of the time is spent at the maximum interval, in
this case 16,384 s, with brief excursions to lower intervals not less
than 1024 s when the frequency deviates too rapidly for the discipline
to follow. This particular figure shows the expected behavior for a
typical telephone modem, where it is important that t remain at large
values whenever possible. In particular, it is important that the
initial frequency adaptation when the discipline is first started be
substantially complete within only the first few samples before t starts
to increase. In Figure 4, where the initial time and frequency offsets
are zero, this occurs after the eighth call.

2.4 Clock State Machine

The discipline must operate over an extremely wide range of network
jitter and oscillator wander characteristics without manual intervention
or configuration. As determined by past experience and experiment [4],
the various data grooming algorithms work well to sift good data from
bad, especially under conditions of light to moderate network and server
loads. However, under conditions of extreme network or server
congestion, operating system latencies, and oscillator wander, any
discipline may perform poorly and even become unstable.

In order to deal with very large transients at and after startup, the
discipline is managed by the state machine shown in Figure 5. Each of
the four states has defined inputs, outputs and transition functions.
Initially, the machine is unsynchronized and in UNSET state. If the
minimum t is greater than 1024, the first update received sets the clock
and transitions to HOLD state. This behavior is designed for toll
services with long intervals between calls. If t is less than 1024 s,
these actions will not occur until after several updates, to allow the
data grooming algorithms to accumulate sufficient data for reliable
synchronization.

In HOLD state the discipline is locked in FLL mode, in order to provide
rapid adaptation to possibly very large clock oscillator frequency
errors. Once entering HOLD state, the machine remains in this state for
several updates, in order to complete, as far as possible, the frequency
adaptation process. After the correction q has decreased below 128 ms,
the machine transitions to SYNC state and remains there pending unusual
conditions.

In SYNC state, a suite of sanity checks, spike detectors and tolerance
clamps are operative. To avoid disruptions due to frequency spikes, the
frequency is clamped not to exceed 500 PPM and frequency adjustments not
to exceed 10 PPM. In addition, frequency adjustments are disabled if the
system dispersion exceeds 128 ms. To avoid disruptions due to occasional
network delay transients, spike detectors reject updates greater than
ten times the dispersion. However, the spike does increase the
dispersion; so, if a true time step were to occur, the first one or two
samples, may be discarded, but eventually the step will be recognized
and corrections made.

Special provisions are made for updates larger than 128 ms, which may
happen occasionally when the network is badly congested or when some
major disruption occurs. It can also happen upon the occasion of a leap
second, when the clock has been automatically stepped by the protocol,
but for some reason the source has not implemented the step. In both
cases, the best response is to ignore large updates, unless they persist
for some time.

However, if no updates less than 128 ms arrive after a timeout interval,
currently 900 s, the eventual response should be to believe any that are
received, regardless of value. The first large update after the timeout
causes a state transition to SPIKE state, but does not set the clock. If
the next update after that is less than 128 ms, the machine transitions
to SYNC state and proceeds normally. In this case the prior update is
considered a spike and ignored. If the next update in SPIKE state is
greater than 128 ms, the machine transitions to HOLD state and sets the
clock. Since the sanity checks are disabled in HOLD state, the
discipline can quickly adapt to the new time and frequency as described
previously.

Figure 6 shows an extreme example, where the initial time offset is 100
ms and initial frequency offset is 500 PPM. Most of the frequency
adaptation occurs in HOLD state, as the FLL wrangles the frequency to
approximate agreement. The initial time offset is purposely chosen so
that the residual offset after this initial adaptation results in the
state machine first transitioning to SYNC state, but with residual
frequency offset beyond the loop capture range in that state. This
eventually causes the offset to exceed 128 ms and, after the timeout
interval, the state machine transitions to HOLD state. This process may
repeat a cycle or two, until the residual frequency offset is
sufficiently reduced. In Figure 6, after about six hours, the adaptation
is complete and t starts to ramp up. In the original NTP Version 3
design, the loop capture range was only 100 PPM and adaptation to even
that unambitious value took well over a day.

3. Performance Evaluation

The performance of the new discipline has been evaluated using a
simulation approach. There are three reasons for this, rather than
building and testing the algorithms in the context of the existing NTP
Version 3 implementation. First, evaluation of these algorithms can take
long wallclock times, since the intrinsic time constants are often quite
long - several hours to days. Simulation time runs much faster than real
time, in fact by several orders of magnitude. Second, the simulation
environment is not burdened by the infrastructure in which the real
software must operate, such as I/O and monitoring code. Third, the
simulator code itself, written in portable C, represents a model for the
eventual formal specification and implementation, as well as a
documentation aid. This is the same approach that proved highly
successful in the development of the original NTP Version 3
specification and implementation.

The simulator faithfully mimics the operation of the implementation,
including the entire suite of data grooming algorithms, as well as the
discipline itself. It can use timestamp data collected by the existing
software operating over real Internet paths, as well as data synthesized
by internal phase and frequency noise generators. The generators produce
white phase noise and random-walk frequency noise, both using zero-mean
Gaussian processes. As shown in subsequent discussion, these generators
closely emulate the real data for most statistical purposes. The general
philosophy is to evolve the algorithms using repeatable, synthetic noise
streams and, once the design has stabilized, verify the expected
operation using real data.

An important strategy in the experiments with real data is the exclusive
use of 19 primary servers in Europe, Asia and the Americas. Since
primary servers are independently synchronized to external references,
such as GPS receivers or ACTS telephone modems, their clocks have
presumed zero time and frequency error. This allows real phase data to
be collected without contamination by errors due to frequency
variations. Real frequency data are separately collected using a series
of measurements with a free-running clock oscillator and a precision
time reference source. The phase and frequency data are collected in
data files which are read by the simulator and integrated with synthetic
noise as required.

There are three sets of experiments described in the following. In the
first set, the behavior of typical Internet paths and computer clocks is
modeled using a special statistic, in order to develop synthetic noise
profiles which characterize common client-server configurations. In the
second set, the behavior of the model is explored using these profiles,
in order to predict how the errors due to network jitter and oscillator
wander scale relative to t. These provide the basis and justification
for the new fully hybrid discipline design. In the third set, the model
is evaluated using real phase and frequency data, in order to verify the
model and discipline behave as predicted.

3.1 Computer Clock Modeling

The first set of experiments is designed to develop a model
characterizing a typical computer clock oscillator synchronized via
Internet paths. The experiments are designed to do two things: (a)
evaluate the error characteristics of typical oscillators and Internet
paths between a University of Delaware site and selected remote sites
and (b) validate that the synthetic random noise generators incorporated
in the simulator realistically emulate these characteristics on a
statistical basis.

That typical computer clocks behave in ways quite counterproductive to
good timekeeping should come as no surprise. There are no explicit means
to control crystal ambient temperature, power level, voltage regulation
or mechanical stability. For instance, in a survey of about 20,000
Internet hosts synchronized by NTP, the median frequency error was 78
PPM [3], with some hosts showing errors over 500 PPM. Since the clock
oscillator is not temperature stabilized, its frequency may vary over a
few PPM in the normal course of the day.

The traditional characterization of oscillator stability is a plot of
Allan variance [1], which is defined using a series of time differences
measured between a computer clock and some external standard. Let xk be
the kth measurement and t be the interval between measurements. Define
the fractional frequency

(7),

which is a dimensionless quantity. Now, consider a sequence of N
independent fractional frequency samples for k = 1, 2,..., N. If the
measurement interval is the same as the averaging interval, the 2-sample
Allan variance is defined

(8)

and the Allan deviation as the square root of this quantity. The results
are commonly displayed as a curve plotted in log-log coordinates as
described below.

Two experiments were designed to establish Allan deviation
characteristics for typical workstation clocks under typical room-
temperature conditions. The PPS experiment measured time differences
between a free-running SPARC IPC clock and a directly connected cesium
clock. Data were collected at 2-s intervals over five days and recorded
in a data file. The LAN experiment measured differences between a free-
running SPARC IPC clock and a primary server on the same network wire.
Data were collected at 16-s intervals over fifteen days using NTP and
recorded in a data file.

The oscillator frequency computed from the LAN data is shown in Figure
7. The frequency varies over a 3-PPM range, sometimes abruptly, which is
characteristic of room temperature changes of a few degrees. This
experiment was conducted in spring, when the laboratory windows were
open and the temperature allowed to follow the weather, and is typical
of �poor� stability. The oscillator frequency computed from the PPS data
(not shown) is similar to the LAN characteristic, but varies over a much
smaller 0.25 PPM range. This experiment was conducted in winter, when
the room temperature was thermostatically controlled, and is typical of
�good� stability.

The results of the PPS and LAN experiments were processed to produce
plots of in PPM against t in seconds in log-log coordinates. The results
for the PPS and LAN data are shown as solid lines in Figure 8. The shape
of each curve depends on the phase and frequency variations specific to
the particular clock oscillator and network path involved. The phase
variations, which are dominated by white phase noise, produce straight
lines with slope -1 on the plots [6]. The frequency variations, which
are dominated by random-walk frequency noise, produce straight lines
with slope +0.5 on the plots. The intersection of these two lines,
called here the Allan intercept, characterizes each particular clock
oscillator and network path.

The Allan intercept is the primary statistic influencing the design of
the discipline. For t below this point, there is an underlying frequency
tendency, so increased averaging can improve the accuracy; however,
above this point increased averaging actually degrades the accuracy.
This various time constants used in the discipline are engineered with
these facts in mind.

The primary contributions to the phase noise evident in the results for
the PPS and LAN data are jitter due to the clock precision and reading
errors, operating system latencies and, for the LAN experiment,
latencies contributed by a lightly loaded 10-Mbps Ethernet. In both the
PPS and LAN experiments, the frequency variations are due to
nondeterministic wobbles of the oscillator frequency, which is affected
primarily by ambient temperature variations.

The dashed lines on Figure 8 show the Allan deviations determined from a
random sequence where each value consists of the sum of a white phase
noise sample and a random-walk frequency noise sample. White phase noise
is generated directly from a Gaussian distribution. Random-walk
frequency noise is generated by integrating samples from a Gaussian
distribution. The standard deviation parameter of each noise generator
was chosen by experiment so that the synthetic characteristics closely
match the measured characteristics. For comparison, the NOISE curve on
the figure was generated using a phase parameter of zero and the
frequency parameter of the PPS experiment. This models the case where
the only errors are due to the simulated clock precision of 1 ms, which
represents the best performance possible with the common Unix
microsecond clock.

The PPS and LAN results represent cases involving external sources,
either directly connected or connected by a LAN. However, these results
do not represent cases where long Internet paths are involved. In the
next set of experiments, the simulator used timestamps measured at 64-s
intervals over a ten-day period in fall, 1996, between server
pogo.udel.edu and the 19 remote servers, individually and in
combination. In these experiments the NTP data grooming algorithms were
connected, but the feedback loop was disconnected, so the clock
oscillator ran at constant frequency determined by the external source.

The Allan deviations for representative sources are shown in Figure 9.
The NOISE, PPS and LAN results duplicate the data of Figure 8 for
comparison. The BARN data represent a �local� server, in this case on
the same network wire as pogo, while the USNO data represent a �nearby�
server at the U.S. Naval Observatory in Washington, DC, and the IEN data
represent a �distant� server at the IEN Galileo Ferraris in Torino,
Italy. The USNO data fairly well represent a path between two servers in
the U.S., where the path is only lightly congested, while the IEN data
represent moderate to heavy congestion typical of paths spanning the
Atlantic.

The curves described so far involve only a single server, which is the
only source used to synchronize the clock. The PEERS curve represents
the case where all servers, except those on the local LAN, are used, so
that the intersection and clustering algorithms can select the best
subset of sources to synchronize the clock. Note that the PEERS curve is
actually lower (better quality) than the �best� remote server, in this
case USNO, which suggests that the data grooming algorithms do in fact
deliver time more accurate than available from any single server
separately.

Since all the servers in these experiment are synchronized to external
sources, the phase noise of each remote source can be determined
independently of the frequency noise. All the curves appear as slightly
wiggly straight lines with slope -1 at low t values, which is indicative
of white phase noise, as described previously. On the other hand, the
NOISE, PPS and LAN curves, which represent free-running clock
oscillators, inflect upward, as expected with random-walk frequency
noise. If the remote servers were not externally synchronized, their
curves would inflect upwards in the same manner as the PPS and LAN
curves.

So far, synthetic noise models for only the NOISE, PPS and LAN sources
have been shown to closely approximate the real noise characteristics.
As the data shown in Figure 9 were measured using the NTP simulator and
open-loop conditions, the question remains as to how faithfully the
white phase/random-walk frequency model applies to the other sources
under these conditions. The solid lines in Figure 10 show the measured
Allan deviation for each source, while the dashed lines show the results
with synthetic noise generated by the NTP simulator program. For this
purpose, the frequency parameter was set to zero, while the phase
parameter was set to agree with the actual phase noise at t = 64 s.

The results show general agreement in all cases at the lower values of
t, but some disagreement at the larger values for some servers. Since
the phase modeling is most important at small values of t, the
disagreement at large values is not considered significant. It is of
interest in passing to compare the USNO curve, which includes the
grooming algorithms, with the USNO (raw) curve, which used the same
data, but with these algorithms disabled. With these algorithms in
place, the errors are reduced by well over a factor of ten.

The results suggest that a useful predictive model for ordinary computer
clock oscillators and real networks can be determined from the Allan
intercept point (t = x, s = y) and assuming white phase noise for t < x
and random-walk frequency noise for t > x. In this model, the Allan
deviation is completely determined by the white phase noise
characteristic of the network and operating system and the assumption of
�good� (PPS) or �poor� (LAN) clock oscillators. A convenient x intercept
for graphical analysis is t = 64 s, which can be used to predict the y
intercept point at any other t by simply multiplying by the ratio of the
selected value to 64. The value t = 64 s is particularly useful, since
this is the default minimum used in the NTP protocol. Figure 11 shows
the y intercept at this value and the equivalent phase noise generator
parameter. In this table, the y intercept is determined directly from
the Allan deviation plots. The p parameter of the synthetic PPS data is
varied until the y intercept matches that of the measured PPS data. In a
similar manner, the p parameter of the synthetic LAN data is varied
until the y intercept matches that of the measured LAN data. The
remaining p parameters are determined by scaling proportionally to the
PPS value.

3.2 Performance Using Real and Synthetic Noise Sources

The next set of experiments is designed to explore the performance of
the NTP data grooming and discipline algorithms with synthetic noise
generators and the parameters of Figure 11 over a simulated interval of
30 days. In each experiment, random timestamps are generated according
to the statistical model for each of the sources and the standard error
of the discipline determined for t from 64 s to 131,072 s. The intent of
each experiment is to determine how the errors scale with t using both
PLL and FLL modes. This is done first using synthetic phase noise only,
then using synthetic frequency noise only.

Figure 12 shows the results with the USNO source for both PLL mode
(solid lines) and FLL mode (dashed lines) as a function of t. When t is
varied with phase noise only (p), the standard error is approximated by
the two nearly horizontal lines. In this case, the PLL outperforms the
FLL by a factor of about ten. When t is varied with frequency noise only
(f), the standard error is approximated by the two lines with slope near
1.4. In this case, the FLL outperforms the PLL by a factor of about ten.
In fact, the PLL becomes unstable at t > 4,096 s, as evidenced by the
absence of plotted points in the figure. In this and other cases, the
criterion for instability is the occurrence of one or more step
corrections of 128 ms or more during the simulation run.

As evident from Figure 12, the PLL alone usually performs better under
conditions of high phase noise and low frequency noise, while the FLL
alone usually performs better under conditions of high frequency noise
and low phase noise. While the generator parameters used in constructing
the figure are typical of a network path consisting of two LANs
connected by a relatively uncongested T1 network, the y coordinates of
the lines scale directly as the noise parameters, but retain the slopes
shown. Thus, since the frequency noise is assumed fixed in the
experiment, the selection of which discipline mode to use depends only
on the phase noise of the particular network path involved.

Assuming the frequency noise is fixed by oscillator type, it would seem
that a simple measurement of phase noise would suffice to determine the
Allan intercept and optimum point to switch between FLL and PLL modes.
As in [2], the phase noise could be measured for each Internet server
prior to regular operation and the optimum point determined. However, as
confirmed by experiment, this is not practical in the current Internet.
Figure 13 shows the absolute phase noise in log-y coordinates measured
for the IEN path, which is typically congested during working hours in
Europe and the U.S. The phase noise varies over three decades during the
hours and days represented in the figure. An NTP client synchronized via
this path would have to periodically measure the phase noise and
recompute the optimum point. In principle, this could be done at
intervals throughout the working day and a profile developed. This is
still only an approximate solution and does not allow for minute-by-
minute adjustment of the optimum point. This is in fact the motivation
for the hybrid mode, in which the weight factors for the FLL and PLL
predictions are determined in real time from the measured prediction
errors.

Figure 14 shows a composite of the standard errors with noise parameters
for each source in PLL mode (dashed lines) and hybrid mode (solid
lines). For most sources over most of the range, the errors in hybrid
mode are about ten times less than in PLL mode. The exceptions are when
the phase noise overwhelms the frequency noise, which occurs for the
most distant servers at small t.

Comparing Figure 14 with Figure 9, it is apparent the inflection point
where the curves corresponding to each source depart the asymptotes on
Figure 14 are roughly a factor of ten above the Allan intercept point on
Figure 9. This is an interesting comparison, in spite of the fact that
the former shows standard error, which is a measure of time differences,
while the latter shows Allan deviation, which is a measure of frequency
differences. In addition, Figure 14 shows data collected under closed-
loop conditions, while Figure 9 shows data collected under open-loop
conditions. Nevertheless, the results suggest that an Allan deviation
characterization of each network path and clock oscillator can be an
accurate predictor of discipline performance.

The final set of experiments is designed to determine how closely the
discipline with synthetic noise emulates the discipline with real data.
Figure 15 shows the standard error for real data with PLL mode (dashed
lines) and hybrid mode (solid lines). Each curve shows the results with
the phase data of Figure 9 and the frequency data of Figure 7. Compared
to Figure 14, the curves for real data are somewhat jagged, mostly due
to the spiky frequency characteristic evident in Figure 7, but in
general follow the same characteristic as synthetic noise. As evident in
the figures, the performance with real data is not quite as good as with
synthetic data; however, the performance in hybrid mode is almost always
better than PLL mode.

4. Conclusions

The results demonstrated in this paper show a substantial improvement in
the performance of the NTP Version 4 discipline over both the original
NTP Version 3 algorithm and the improved algorithm described in [4]. In
addition, the new algorithm automatically selects the optimum
combination of FLL and PLL data over a wide range of network jitter and
oscillator wander while in regular operation and not requiring initial
calibration. Perhaps the most striking result is that the new discipline
is effective with t well over one day, which is an attractive feature
when telephone toll charges are involved.

As the complexity represented by the suite of crafted NTP algorithms
have grown, it has become necessary to test and validate their
performance by analysis and simulation. A good deal of the discussion in
this paper has focused on the nature of the simulation process, design
and analysis of the synthetic noise models and validation of the
discipline performance in systematic simulation exercises. Many times
during these exercises interesting things happened which resulted in
significant improvements in the algorithm design. While this paper has
not revealed them, there were in fact a number of fruitless experiments
and dead ends which gave some insight into the relative merit of some
design features, in particular the sanity checks, spike detectors and
tolerance clamps. In addition, the insights gained suggested
modifications which considerably simplified the FLL and PLL design.

Finally, the refinement of the NTP simulator program, while guiding the
development of the various algorithms, may be most valuable as a
specification vehicle, implementation tool and documentation aid for the
actual NTP Version 4 implementation. It should be mentioned in passing
that the simulations demonstrated in this paper require a good deal of
machine time. A full suite of all simulations, including the simulator
program itself and Matlab analysis programs, requires over two hours on
a 300-MHz workstation.

5. References

References 3-5 are available from Internet archives in PostScript
format. Contact the author for location and availability.

1. Allan, D.W. Time and frequency (time-domain) estimation and
prediction of precision clocks and oscillators. IEEE Trans. on
Ultrasound, Ferroelectrics, and Frequency Control UFFC-34, 6 (November
1987), 647-654. Also in: Sullivan, D.B., D.W. Allan, D.A. Howe and F.L.
Walls (Eds.). Characterization of Clocks and Oscillators. NIST Technical
Note 1337, U.S. Department of Commerce, 1990, 121-128.

2. Levine, J. An algorithm to synchronize the time of a computer to
universal time. IEEE Trans. Networking 3, 1 (February 1995), 42-50.

3. Mills, D.L., A. Thyagarajan and B.C. Huffman. Internet timekeeping
around the globe. Proc. Precision Time and Time Interval (PTTI)
Applications and Planning Meeting (Long Beach CA, December 1997).

4. Mills, D.L. Improved algorithms for synchronizing computer network
clocks. IEEE/ACM Trans. Networks (June 1995), 245-254.

5. Mills, D.L. Clock discipline algorithms for the Network Time Protocol
Version 4. Electrical Engineering Department Report 97-3-3, University
of Delaware, March 1997, 35 pp.

6. Stein, S.R. Frequency and time - their measurement and
characterization (Chapter 12). In: E.A. Gerber and A. Ballato (Eds.).
Precision Frequency Control, Vol. 2, Academic Press, New York 1985, 191-
232, 399-416. Also in: Sullivan, D.B., D.W. Allan, D.A. Howe and F.L.
Walls (Eds.). Characterization of Clocks and Oscillators. National
Institute of Standards and Technology Technical Note 1337, U.S.
Government Printing Office (January, 1990), TN61-TN119.

Figure Captions

Figure 1. Clock Discipline Algorithm

Figure 2. FLL/PLL Prediction Functions

Figure 3. Raw/Filtered Data Comparison

Figure 4. Poll Interval Adjustment
Figure 5. Clock State Machine

Figure 6. Initial Transient Adaptation

Figure 7. LAN Frequency

Figure 8. Allan Deviation

Figure 9. Allan Deviation - Combined Data

Figure 10. Comparison of Real and Synthetic Allan Deviation

Figure 11. Allan Deviation Phase Noise Parameters

Figure 12. USNO Phase and Frequency Noise Comparisons with PLL and FLL

Figure 13. IEN Phase Noise

Figure 14. Hybrid Mode Standard Error by Source - Synthetic Noise

Figure 15. Hybrid Mode Standard Error by Source - Real Data