Autokey

The main focus of work during the last quarter has been the implementation, test and deployment of the Autokey public-key cryptographic authentication scheme in NTP Version 4. The initial Autokey protocol design is summarized on the web status page and briefing slides at www.eecis.udel.edu/~mills/autokey.htm. As of late March, this design has been implemented, tested and deployed at selected sites in CAIRN. There were several unanticipated problems in this deployment and much was learned.

There are many inobtrusive cryptographic values instantiated by the protocol that are updated when the RSA key pair, Diffie-Hellman parameters and related private values are updated in the normal course of operation. Originally, we thought these values would be updated at relatively infrequent intervales, like once every three months. In that case, it would not be intrusive to restart the NTP daemons in order to recover the latest public values. This turned out to be a significant security vulnerability, especially at or about the time the values were updated.

Accordingly, the protocol was redesigned so that public value refreshment does not require daemon restart and so that refreshment does not have to be synchronous throughout the community of users. The scheme requires every public value used in the protocol to be signed and equipped with an authentic timestamp, not just the keys and parameters. In order to deflect clogging attacks, the protocol was redesigned with the first line of defense a timestamp check. In case of replay attack, old signatures are discarded before the expensive public-key algorithms are invoked.

The requirement that every cryptographic value carry a timestamped signature extends to the creation time of the data values themselves,
including the keys and parameters generated and the files they occupy. The files already have the creation timestamp embedded in the file, but it is necessary that the timestamp be embedded in the file name itself in the form of a file name extension. This allows new file generations to coexist in the various filesystems along with ones currently in use. A server rolling a new RSA key pair, for example, simply restarts the daemon with the new files and all clients will notice this and load the new files without stopping the daemon.

Extensive testing with contrived scenarios pointed out possible obscure vulnerabilities where an intruder could force a server or client into an unstable state and cause the client to time out or loop or otherwise destabilise the Autokey or NTP protocol. In all cases found so far, the only completely robust behavior requires the client to disregard errors or attempted intrusions to be completely ignored and in the worst case cause the client to time out and reset the cryptographic association. In the redesign, the timeout period is the NTP reachability period, which is eight times the poll interval. In practice, this should represent acceptable behavior, since the clock discipline algorithm is designed to survive that without significant performance degradation.

It is clear that public value dissemination in a large network is going to be a real problem. The present scheme which requires sending all generated values to a central site and then broadcasting to all clients is not acceptable for other than testing and experiment. This points up the need for centralized certificate storage and retrieval using directory services such as DNS. However, there still remains a bit work to provide certificate retrieval as an automatic feature in NTP. Specifically, the current scheme using a forked process to access DNS services must be upgraded to a fully asynchronous process. Solutions to this requirement exist; however, it is not clear how generic they will be over the suite of 24 known ports of the software distribution. Fortunately, the volunteer corps of NTP engineers seems to have the porting issues well in hand.

Future Plans

From the above discussion, it is clear some development in protocol design and implementation is necessary to avoid the vulnerabilities mentioned. In truth these are minor modifications of existing code and should be completed in a month or two. Meanwhile, we are beginning work on a comprehensive report describing the Autokey protocol rationale, design and implementation. It will include a vulnerability analysis and threat assessment. We expect to complete the report by the end of May.

We have obtained a version of the Secure DNS software and brought it up on a FreeBSD development machine here. We plan to use it in testing the asynchronous resolver and certificate retrieval code.

Network Simulator

The progress of our project in simulating very large networks has been reported in previous quarterly reports and in web pages and briefing slides. As mentioned previously, we believe the simulator can be an important tool for studying the dynamics of very large networks with the order of 10,000 nodes. However, one of the problems identified in the original project goals was the need to simulate real internets where the structures tend to clusters with fractal-like topologies. Graduate student Tamil Basu is developing a program that can generate random topologies with a fractal character that may more closely emulate the real Internet of today.

Mr. Basu started with the work done by recent graduate Robert Redwinski, who developed a topology generator and descrete event simulator capable of supporting 10,000 nodes with an average connectivity at least three. Preliminary experiments have been carried out on smaller networks in the order of 3,500 nodes with representative routing protocols used in the Internet of today. The results show the protocols operate as expected in response to induced trauma such as link and node failures. However, the elusive behaior we are looking for and that could explain past observations of anomalistic behavior in the ARPANET and NSFNET have not been found.

The simulator is comprised of three components:

a) Random Topology Generator (RTG), which generates random candidate networks according to the Waxman model.

b) One or more routing algorithms, implemented as a collection of finite state automata which communicate over the network.

c) Simulation engine, implemented as a conventional descrete event simulator with several features elaborated below.

The routing algorithms, which are at the heart of the simulator, have been chosen to be the most generic as possible, yet to expose inherent design limitations. Keeping this considreation in mind, the algorithms that have been implmented include variants of Bellman Ford (BF) and the Distance Vector Multicast Routing Protocol (DVMRP). While these algorithms have well known design deficiencies, they were chosen to establish the boundaries of network stability and are expected to be replaced with more subtle versions of BF and Dijkstra/SPF.

For BF the topology generator provides the user with several command line options such as spilt-horizon in order to more realistically emulate algorithms now in use. The simulation engine itself is a conventional discrete event simulator including a global event queue and event service modules. The design is heavily influenced by the need for very large numbers of nodes and links. One of its key features is a scheme which maintains only one gloval network state matrix, together with reltively small local event queues which incorporate local deviations, such as changes in the global matrix while a message is in transit, for example.

These three components are currently functional and in use for studying the response of the routing algorithms to induced transients and probabilistic failure scenarios. However, we believe the Waxman model, which produces two-dimensional homogeniours network toplogies, does not accurately model the Internet, which has a distinctly non-homogenious structure with LANs connected to MANs connected to WANs. 

Approach

In order to model some kind of clustering or fractal topology, the RTG must be overhauled. The RTG is currently an offline process which computes the topology as a database which is then input to the simulation engine. This saves a great deal of time and effort during the simulation which is of great importance considering the size of the networks, as well as the processing time and memory involved in generating and simulating them.

The RTG currently creates topolgies based on the Waxman model, which sites nodes at random coordinates of a two-dimensional space and interconnects them randomly with given average connectivity. This is the principal difference between our model and conventional ones base on OpNET and NS. It directs the output of the topology generation to an ASCII file which is then used by the simulator engine.

The case to be made here is that in spite of the fact that a random generator is used to develop the technology it is still unable to mirror the real life scenario wherein networks are built upon each other. Instead it provides a planar network, albeit random, a structure which is rarely found in real life networks. To solve the issue it is our proposal to implement the RTG in a somewhat different manner, ie. to harness the inherent advantages of the Waxman model, but at the same time to implement the overall topology using the properties of fractals which is a very good model for the manner in which real life networks exist nowadays.

For this purpose there are two candidate approache strategies:  

a) Use the RTG to generate several individual networks of appropriate sizes, stack them one on the other, then recursively generate random interconnects as if a lower network was one node and the interconnect endpoints are chosen randomly.

b) Use the RTG to generate a network of appropriate size, then select random outbound several networks of appropriate sizes, but with different topologies due to the nature of the generation process, and then use them to populate the inner areas of circles of increasing diameter stacked upon each other. 

A number of issues must be resolved in either approach. One of the most important is the process for selecting the endpoints of the links connecting two networks. Another is how the nodes ar apportioned among the various networks at each level.

A main focus of the proof of concept validation experiments is how the simluator reacts to network partitioning where one or more large networks is disconnected from the remaining population for a time, then reconnected again. From past experience, we know this is the acid test of a routing algorithm, since large quantities of routing information must be shuttled over a potentially large fraction of the network routing database.

Future Plans

The extensions to support a fractal-style RTG require significant changes and upgrades not only to the RTG but also to affiliated support modules. We plan to complete the analysis, design and implementation of the revised RTG by the end of May and to continue validation and experiment with it over the summer.

Infrastructure

Problems continue in the deployment of a truly robust NTP subnet for CAIRN. One is the requirement for upgrade to FreeBSD 3.4 in all routers, in order to support a common suite of cryptographic schemes and radio interfaces. At this time, only isipc6 and udelpc have been upgraded.

A second problem has to do with the serial port hardware and software which supports the GPS radios at SAIC, ISI-W, SAIC and UCL. Modern UART chips include a FIFO with up to 16 stages of delay. The FIFO must be disabled in order to capture precision timestamps from the radio. A related problem common to all Unix systems known to this investigator is a software FIFO implemented in the driver, which also must be disabled. Each driver and each system seems to have a different way to do this, generally requiring a patch or kernel rebuild.

A third problem is apparently peculiar only to dartpc. When a serial port is opened and the first character received, the machine hangs up in such a way that a power-cycle reset is the only way to continue operation. Although this was first observer in an older FreeBSD version and we thoiught it would go away in the newer FreeBSD 3.4, apparently it has not. Until this problem is resolved, the UDel time is available only from other stratum-1 time servers pogo.udel.edu and rackety.udel.edu.

An obvious and attractive alternative to the serial port problems is to use the IRIG signal generated by the GPS radios and connect to an audio card on the router. The NTP radio clock driver suite includes one that uses classic DSP algorithms to extract timestamps from an IRIG-B signal with accuracies in the low tens of microseconds. Used with our nanokernel modifications, which are now in stock FreeBSD from 3.4, the routers could keep the clock to at least this order of accuracy. However, a generic solution to the FreeBSD audio card interface to the NTP daemon is so far elusive.

We are the happy recipients of a new TrueTime NTS-200 NTP time server with embedded GPS receiver. This is a new product from TrueTime and is so far as we know the first commercial implementation of NTP Version 4. The implementors used an interesting strategy with an embedded Unix kernel and file system preserved on Flash memory. The approach allows the use of existing protocol modules derived from the NTP software distribution, which should greatly facilitate new feature integration, in particular the public-key cryptography support.

Future Plans

The upgrade of the CAIRN routers to FreeBSD is going slowly, but we expect to have at least a cluster of routers near the CAIRN epicenter upgraded and in service with Autokey by the end of May. The situation at UCL has not been encouraging. The GPS radio there has been out of service since the last round of testing. One problem is that the radio itself and the router are separated by a considerable distance, which considerably complicates connection of the PPS signal from the radio to the router. It is not clear what can be done about this.