Timekeeping in the Interplanetary Internet

For NASA/JPL we are working on issues involved with timekeeping in the Interplanetary Internet. Harish Nair is poking through the SPICE programs and documentation which may be necessary to predict position and velocity vectors describing moving objects in space and on the surface of Mars. This will require procurement of epheremis data in some form or another, perhaps assisted by range and range rate measurements by NTP itself. Some of this technology is as old or older than the DARPA SATNET program some 25 years ago, but at that time the earth stations really had only to measure the roundtrip time and the satellites stayed pretty much where they were told.

Our model is that the timescale for all platforms, be they in space or on the surface, run at a constant rate relative to atomic time (TAI). This creates somewhat of a problem in that for an event like a supernova explosion the flash should be observed by all platforms consistent with the quasi-planar wavefront from the source. This requires all platforms to calculate the offset relative to barycentric time (TDB) with respect to their own position in space, which changes continuously. Thus, only the local clock rate is disciplined; the time is left an arbitrary value and the offset from TDB is calculated only when needed.

The scheme of choice goes something like this. Upon tranmitting a NTP message, a client determines the apparent TDB time, position vector and velocity vector from local ephemeris. For symmetry, as required in NTP symmetric mode, These values are included in an extension field. The message includes this extension field along with the transmit timestamp, which when the clock is not synchronized may not agree with TDB time.

Upon receiving this message, the server determines the apparent TDB time, position vector and velocity vector from local epheneris. For stateful servers, these values are saved for the next transmission opportunity. When transmitting the next NTP message, the server again determines the apparent TDB time, position vector and velocity vector from local epheneris. The message includes three extension fields correspongin to the position and velocity vectors for each of the three timestamps included in the NTP message.

Upon receiving thies message, the client again determines the apparent TDB time, position vector and velocity vector from local epheneris. The client now has four timestamps, as in ordinary NTP, plus four sets of position and velocity vectors, one corresponding to each timestamp. There are two unkowns, time offset and roundtrip delay to be determined, but these depend on the actual ephemeris time, which is not known at this point. An appropriate solution involves an iterative approach where the apparent time and delay are calculated from the server values, assumed accurate, and the apparent client values. New apparent position and velocity values are determined from the ephemeris and the process iterates.

Complicating the above, the residual clock frequency error may introduce considerable error if the time between updates is relatively long, as would be expected in some cases. After a few measurements the frequency can be disciplined in the usual way, but this affects the position and velocity vectors and residuals with respect to the ephemeris. What makes frequency-induced errors more nasty is that the frequency may fluctuate due to spacecraft thermal cycles and power management. Assuming primary servers on Earth together with epemerides of the transmitter location, the scheme continues to refine the residuals and develop global time. Kalman filters or ARIMA methods might be a good tool to deal with the residuals and steer to the best time.

There is some concern that the expense of these calculations, both in processor cycles and thermal management, may not be justified in all cases. For instance, NTP between Mars orbiters and the surface is no different than NTP between Earth orbiters and the ground. In fact, NTP has flown in space before on an AMSAT satellite where the embedded Intel processor ran the same code as used on Earth. This assumes the satellite doesn't move very far during the roundtrip propagation time for the NTP message and reply. If finer correction is required, orbital elements could be derived from radio rise and set times and corrections computed on the fly.

There has been some discussion on what the Mars orbiters can do with respect to antenna orientation. There is a limited fuel supply to point the antenna to Earth and it may not be a good idea spending fuel to point it at other orbiters or the ground in order to exchange time packets. Also, it is not likely the orbiters can communicate with each other using an omnidirectional antenna and low power, at least most of the time. However, omnidirectional antennas would seem to be the choice when communicating with surface platforms. Assuming the surface platforms can discipline the local clock to some degree of precision, the surface clock could be used as a flywheel to synchronize clocks in the orbiters as they pass over the platform.

Future Plans

The general plan is to evaluate the above approach in detail, especially the impact of the required resources for processor cycles and power management. Of particular concern is the magnitude of frequency wander due to normal thermal housekeeping functions.

It is our eventual plan to implement a suite of algorithms which transform data such as the extension fields and timestamps described above to a TDB reference space, so that the existing NTP algorithms can be used without modification to discipline the local clock. We plan to sift through the SPICE library looking for useful tools and algorithms useful for this purpose. We plan eventually to test the algorithms using the NTP simulator described in the following section.

NTP Simulator

Up to the present time, the various algorithms used in NTP have been evaluated using a special purpose simulator called ntpsim. The ntpsim uses algorithms somewhat simplified from those in the reference implementation, but behave very nearly the same under nominal operation conditions with typical network delay jitter and oscillator frequency wander. The NTP algorithms have grown in complexity over the years and some quirks of the reference implementation have cropped up from time to time. This has been most apparent when operators, fearful of dreaded backward time steps, have insisted that clocks always be slewed, rather than stepped, even if they are initially in error by a considerable amount, like a week.

The ntpsim algorithms include a number of heuristic defenses against low probability events, such as transient spikes, mode changes server restarts. In principle, it would be possible to incorporate these features in ntpsim; however, this would require a good deal of effort and require verification that the features work the same in both simulation and practice. 

Harish Nair looked into the question of whether the existing reference implementation could be modified to function as a simulator driven by a programmed script or the stochastic source model developed in previous research. This turned out to be easier than first imagined and, in fact, has been completed in the NTP development version. The simulator uses the same code as the daemon, but surrounds it with a special purpose descrete event simulator. The only changes are in the input/output, timer, clock reading and clock discipline code.

The simulator can be driven by explicit impulse generators selected on the command line or by a synthetic noise generator which closely duplicates the effects of network jitter and oscillator wander. In its present form, the simulator supports only one association where the synthetic source acts like a server and drives the NTP code as a client.

We expect to use the simulator to resolve some difficult scenarios, especially those which play out in a matter of days, like when the poll interval ramps up to a day or more and the clock discipline is in a frequency-lock mode. If the frequency wander suddenly increases, the poll interval must back off to a lower value in order to track the relatively rapid frequency changes. The code that does this is largely cut-and-try and should be optimized.

Another area needing simulation is the behavior of the algorithms when step is disabled and the clock discipline is forced to slew. This causes big trouble when the frequency and frequency change clamps kick in as required by correctness assertions. The general behavior is best described as a unruly pinball machine which almost always (we think) eventually subsides to a stable regime. Problem is, in real life the pinball game can last a week. The simulator can run the course in a few seconds.

Future Plans

While a few minor touchups and features may be done, the simulator is essentially complete and will shortly become available in the public software repository. As mentioned above, we expect to use the simulator for proof of concept testing in the Interplanetary Internet time synchronization project.

Autokey

Autokey is a cryptographic protocol for the authentication of servers and clients in a hierarchical network. It uses certificates, timestamped digital signatures and reverse hashes to minimize clogging vulnerabilities, minimize resource exposure to public key computations and reduce packet sizes. The original intent of the design was for NTP servers, clients and peers; however, the same defenses designed for the Autokey protocol can be applied to other services and in non-heirarchical network topologies, including sensor networks. This is the intent in DARPA and ATC supported research.

There have been several changes and additions in the Autokey protocol specification and and reference implementation. Certificates are now managed in a LRU cache shared among all associations. Certificates can be used to sign and verify other certificates. Regular garbage collection removes invalid or expired certificates. The ASSOC request can now retrieve a certificates by subject hame. A new message SIGN has been added which a client can use to ask that its certificate be signed by another server acting as certificate authority.

The key management procedures have been simplified and regularized. Key refreshment is now entirely automatic and can be driven by a script executed by a Unix cron job. The script rolls new keys and host certificates, including identity strings, and restarts the daemon. Everything else is automatic, even with symmetric peers which used to take a very long time to restart. Keys no longer have to be kept in a shared directory on NFS servers, so the root passwords of the clients do not have to be known to each other or the server.

The X.509 certificate format has been updated to Version 3, so that additional information can be included in a certificate extension field. These fields are intended to be used for related credentials such as identifcation strings and multi-component capability metrics. Identity strings are intended as a unique identifier that binds credentials to the certificate. They could take the form of a retina scan, chip serial number or recommender (in the PGP sense). The intent at this time is to provide a framework in which a suitable scheme scenario can be embedded, but without restricting the scheme to any particular one.

An example of an identity string scheme might work as follows. Select a particular symmetric key which is shared by all members of a particular NTP group, such as a campus or department. Concatenate the value of this key with the subject name on the certificate and compute the MD5 message digest, then Include the result in the extension field before signing. Clients and servers can then verify the corresponding hash matches the extension field and discard bogus requests before wasting processor cycles. With this scheme the selected key is not revealed and intruders cannot manufacture an acceptable certificate. There may be other useful schemes waiting to be discovered in a literature search.

The single most worrisome vulnerability using strong cryptography in sensor networks is a hazard where an intruder runs down the battery by forcing needless signature encryptions and decryptions. The Autokey protocol is specifically designed to minimize such hazards. In order to improve the resilience of the protocol and harden it against clogging attacks, the protocol was subjected to yet another rigorous vulnerability analysis in the hope of simplifying the state machine, improving the error checking and making the protocol specification more regular and readable. The analysis resulted in much simpler and more direct header processing steps in both the protocol specification and reference implementation. All extension fields now go through the same data extracting and error checking procedures and the code is more straightforward and readable.

The analysis also suggested more effective defenses against clogging attacks. The only conditions under which an attack might be successful are while the certificate trail is being explored. Once the certificate is verified and the proventic bit is lit, it is not possible to provoke a decryption attack where the victim client consumes precious processor cycles. On the other hand, there are two remaining vulnerabilities involving encryption attacks. One is when requesting the cookie, in which the client presents its public encryption key and the server encrypts the random cookie and then signs the field. The stateless server has no defense against clogs of this type other than gapping; that is, dropping packets that exceed some preset rate. The client will of course drop the response, since the packet will fail the loopback check.

The other hazard is when the client presents a certificate to be signed. Presumably, the victim server would have to verify the request, sign the certificate and finally sign the extension field. There are two ways to deflect such attacks. The client includes an identification string in the X.509 extension field. In the scheme suggested above the identity check does not involve signature decryption, so a bogus certificate is detected and deflected. Second, the SIGN request is valid only when the client has synchronized to proventic time. This means that the field signature is valid, even if the server can't tell if a replay. While yet a victim of a decription attack, at least the server will not have to verify the certificate, sign the result and sign the response field.

Future Work

The Internet Draft on the Autokey protocol specification has been under major revision as well. It has been submitted to the STIME task force for final review. We expect it will be issued for last call by the IETF and launched on the standards track. An issue remaining for future work is a more refined scheme for proving identify using an X.509 extension file. The immediate issue is to find a scheme where compromised members can be expunged from the group and new ones be added without affecting working members.

Manycast

Manycast uses IP multicast to broadcast an invitation for potential servers to reveal themselves. It is current work supported by DARPA. Clients receiving responses to these invitations mobilize an association and, after a delay to groom the samples and suppress Byzantine falsetickers, outlyers are shaved from the population until only a maximum (three) are left. The scheme may appear straightforward, but it must be intricately engineered to minimize implosion hazards, repair damage when a server is lost, minimize network traffic and insure robust authentication.

Manycast is designed to operate in conjunction with Autokey in a completely fire-and-forget deployment mode. For a sensor network application, all sensors are configured as both manycast server and client mode. Some sensors with more powerful radios are configured as primary servers and the network automatically nucleates around them. As each new association forms, it cranks the Autokey protocol and, when that protocol completes there is no additional packet or processing overhead other than an occasional twitch to generate a new key list.

The new certificate provisions work very will with Manycast. The initial volley results in certificates from all servers and present a rich mix of certificates that can be used to sign other certificates. In fact, unless specifically directed otherwise, each of a sizeable number of clients will accumulate all the certificates necessary to proventicate the trail to the primary servers.

Several minor changes were made to the Manycast protocol and reference implementation to improve and smooth the response to transients, such as when a server refreshes its server seed or rolls a new certificate and restarts the protocol.

Future Work

The Manycast algorithms and decision metrics are considered mostly maturde. An important area that needs more work is a scheme to automatically select a stratum appropriate for the expected accuracy and available resources of the nearby server population. We expect to mine Ajit Thyagarjan's dissertation for applicable heuristics. It is likely that some sort of whisper campaign protocol will be necessary in order to balance the load among the available servers.

Load leveling can be implemented using an extension field which carries, for example, a list of the current servers mobilized by the Manycast client. Manycast servers can use the combined lists to compute a decision threshold for each server. Each server compares a random roll to its threshold to determine whether to respond to a client request. Details remain to be worked out.

Multi-Component Trust Metrics

This work is the primary area of interest in the ATC funded activity. The problem is to determine from various security related variables the level of trust accorded to some specific function. The approach involves the use of classification algorithm using multi-component trust vector and defined operations to establish a single trust metric. A confidence threshold would be established above which the neighbor would be considered trustworthy.

Our approach is based on pattern anaylisis and classification (PAC) methods, which were first developed well over two decades ago. A set of feature vectors record a particular set of measurements for some experiment outcome. The feature vectors are members of an n-dimensional vector space, where each component is associated with a dimension. Sets of decision planes are devised to classify the space of acceptable and unacceptable outcomes for some derived function and to determine a fitness metric.

An example having nothing to do with cryptotraphy, but illustrating the methods, is derived from original work done to reliably decode manually sent Morse code some 25 years ago. A Morse code element consists of a mark (tone) followed by a space (silence). Marks can be dot or dash intervals; spaces can be element, character or word intervals, giving a total of six possible elements represented on a space of two dimensions for mark and space. A set of three decision planes, lines in this case, can be defined to classify these six elements as determined by maximum a-priori probabilities.

A similar approach could be used for the application here, although the decision planes would presumably be fixed by design and not probabilistic. The feature vectors would involve a representations of low level values, such as the influence of age, how many certificates have been verified, how many recommenders have been found, whether the identity has been verified, whether the neighbor time has been syncrhonized, and so forth. Every client has such a vector, perhaps depreciated by time, for every known neighbor.

A set of decision planes would be necessary for every cryptographic function, such as whether to believe a certificate signed by a neighbor or whether to believe some identification string provided by the neighbor. In point of fact, these functions do not have to be cryptographic and could include such things as to recognize threats of various kinds, revoke keys and so forth.

For the intended use, consider the trust vector to contain components with values between zero and one. The trust vector is determined by an initialization vector and operations on trust vectors received from neighbors, perhaps contained in Autokey extension fields. Since trust often involves a product of values, It may be appropriate to represent components as logarithms of these values.

We define an operator that takes as argument two trust vectors and produces a third presumably more trustworthy than either of the original two. For instance, one of the vectors could be provided by a neighbor, while the other could be another neighbor or the client trust vector. An operator might be defined for each result vector component consisting of a list of algebraic operations on two or more argument vector components. For instance, the trust component for the number of valid certificates could be the sum of the corresponding components of the argument vectors. The component to revoke a key might be determined by the product of the component to produce a key times the component to verify the key times the component to sign the key times a weight factor. In the proposed design the requirement for vector space is relaxed, although the space must be closed under each of the operators.

Future Work

This activity has just been started and real progress