Design and Analysis of Distributed Algorithms

Chapter 1. Distributed Computing Environment

Because of the multiplicity and variety of distributed systems and networked environments and their widespread differences, this book does not focus on any single one of them. Rather it describes and employes a distributed computing universe that captures the nature and basic structure of those systems (e.g., distributed operating systems, data communication networks, distributed databases, transaction processing systems, etc...), allowing us to discard or ignore the system-specific details while identifying the general principles and techniques.

This chapter contains the description of this universe, its components, its laws (the axioms), and introduces the notions of problems, protocols, cost measures and knowledge. All these concepts are grounded by means of a sample problem, broadcast, and the design of a simple solution, flooding.

Chapter 2. Basic Problems and Protocols.

The aim of this chapter is to introduce same of the more basic primitive computational problems and solution techniques. These problems are basic in the sense that their solution is commonly (sometimes, frequently) required for the functioning of the system (e.g., broadcast and wakeup); they are primitive in the sense that their computation is often a preliminary step or a module of complex computations and protocols (e.g., traversal and spanning-tree construction). Some of these problems (e.g., broadcast and traversal}, by their nature, are started by a single entity; other problems (e.g., wake-up and spanning-tree construction) have no such a restriction. The computational differences created by the additional assumption of a single initiator, as we will see, can be dramatic.

Included also in this chapter are the (multiple-initiators) computations in tree networks. Their fundamental importance derives from the fact that most global problems (i.e., problems that, to be solved, require the involvement of all entities), oftentimes can be correctly, easily, and efficiently solved by designing a protocol for trees, and executing it on a spanning-tree of the network.

The techniques we introduce in this chapter to solve these problems are basic ones; once properly understood, they form a powerful and essential toolset that can be effectively employed by every designer of distributed algorithms.

Chapter 3. Election.

In a distributed environment, most applications often require a single entity to act temporarily as a central controller to coordinate the execution of a particular task by the entities. The problem of choosing such a coordinator from a population of autonomous symmetric entities is known as Leader Election. There is no restriction on the number of entities that can start the computation, nor on which entity should become leader.

Since election provides a mechanism for breaking the symmetry among the entities in a distributed environment, it is at the basis of most control and coordination processes (e.g., mutual exclusion, synchronization, concurrency control, etc.) employed in distributed systems, and it is closely related to other basic computations (e.g., minimum finding, spanning-tree construction, traversal).

In this chapter we study the election problem. We start with facing the unexpected impossibility result and imposing conditions to bypass it. We then continue examining the problem and designing solutions in specific networks topologies and continue by focusing on the development of election protocols that are generic, that is can be used in any network. Most of the focus is on ring networks since the tools and techniques developed there can be generalized and exported to the other cases.

Chapter 4. Message Routing and Shortest Paths.

Communication is at the basis of computing in a distributed environment but the task to achieve it efficiently is neither simple nor trivial. Consider an entity x that wants to communicate some information to another entity y . An efficient approach is to choose a single path from x to y : the message sent by x will travel only along this path, relayed by the entities in the path, until it reaches its destination y . The process of determining a path between a source x and a destination y is known as routing. If there is more than one path from x to y , we would like obviously to choose the least expensive one, called shortest path. The process of determining the most economic path between a source and a destination is known as shortest-path routing. The (shortest-path) routing problem is commonly solved by storing at each entity x information that will allow to address a message to its destination though a (shortest) path. This information is called routing table.

In this Chapter we will discuss several aspects of the routing problem. First of all, we will consider the construction of the routing tables. We will then address the problem of maintaining the information of the tables up-to-date should changes occurr in the system. Finally we will discuss how to represent routing information in a compact way, suitable for systems where space is a problem

Chapter 5. Distributed Sets Operations.

In a distributed computing environment, each entity has its own data stored in its local memory. Some data items held by one entity are sometimes related to items held by other entities, and we focus and operate on them. In general, an entity x has a set of relevant data D_x . The union of all these local sets forms a distributed set of data. A query is a request for some information about the global data set, as well as about the individual sets. A query can originate at any entity; if the entity where the query originates has locally the desired information, the query can be answered immediately; otherwise, the entity will have to communicate with other entities to obtain the desired information.

In this Chapter we first focus on an important class of queries, order statistics; these includes finding the median of the set, finding the k -th smallest element. The problem of answering such queries is traditionally called selection.

Since selection as well as most queries are more easily and efficiently solved if the distribution is sorted, we will also investigate the problem of sorting the distributed data.

We will then concentrate on distributed set operations, that is computing union, intersection and differences of the local sets. The ability to perform such operations has a direct impact on the processing of complex queries usually performed in databases.

Chapter 6. Synchronous Computations.

In the distributed computing environments we have considered so far, we have not made any assumption about time. We do not know for example how much time will a communication take, and no assumption is made on the functioning of these clocks, their rate, how they relate to each other or to communication delays. For these reasons, the distributed computing environments described by the basic model are commonly referred to as fully asynchronous systems. They represent one extreme in the spectrum of message-passing systems with respect to time. At the other extreme are fully synchronous systems, distributed computing environments where local clocks are synchronized and transmission delays are bounded.

A fully synchronous computing environments is dramatically different from an asynchronous one: it provides the protocol designer with computational means and tools that are both unique and very powerful. In this Chapter, we will study these tools and learn how to exploit them to design efficient protocols. Starting with basic techniques of communicators and pipelining, we will also examine the notion of asynchronous-to-synchronous transformer, a ``compiler'' that given in input an asynchronous protocol solving a problem P generates an efficient synchronous protocol solving P . We will then investigate the two basic techniques that make an explicit use of time, waiting and guessing; we will use them in isolation and in conjunction in the context of minimum-finding in rings and other networks. We then focus on three synchronization problem: in unison (to ensure that all local clocks sign the same value), the wake-up or reset problem (all entities must enter a special state), and the firing squad problem (all entities must enter a special state at the same time and for the first time).

Chapter 7. Computing in Presence of Faults.

In all previous chapters, with few exceptions, we have assumed total reliability, that is that the system is failure-free. Unfortunately, total reliability is practically non existent in real systems. In this Chapter we will examine how to compute, if possible, when failures can and do occur. Clearly no protocol can be resilient to an arbitrary number of faults. In particular, if the entire system collapses, no protocol can be correct. Hence the goal is to design protocols that are able to withstand up to a certain amount of faults of a given type.

In this Chapter we examine the impact that faults have in distributed computing environments. As we will see, the consequences are devastating even when faults are limited in quantity and danger. just one entity in a complete network might crash This Single Failure Disaster result imposes a dramatic limitation on the design of fault-tolerant protocols. The only way around (possibly) is by substantially restricting the environment: either investing in the software and hardware necessary to make the system fully synchronous; or by constructing reliable fault-detectors; or, in the case of crash faults only, by ensuring somehow that all the faults occur before we start, i.e. partial realiability. Alternatively, we can give up certainty on the outcome and use randomization.

We study how to achieve fault-tolerance using each of these techniques in the case of localized entity failures. We then study the impact of localized link failures by means of a tale.

Finally, we consider the case ubiquitous failures; that is, communication faults that can occur anywhere in the system. For the designer of a protocol, these types of faults are much more difficult to handle than the ones that occur always in the same places. In the latter case, once a fault is detected, we know that we can not trust that link; with mobile faults, detection will not help us with the future events. It is therefore not surprising that the number of dynamic faults that can be tolerated at each time unit is by far less than that of the localized and permanent faults we can deal with. what is surprising is perhaps the fact that something can be done at all.

Chapter 8. Detecting Stable Properties.

The types of problems we are going to discuss in this chapters arise in very different contexts and situations, and sometimes they appear to have little (if any at all) in common. These problems arise for example in the context of global termination: detecting whether a computation (e.g., the execution of a protocol) has globally terminated; garbage collection: deciding whether some distributed objects (e.g., data items) are no longer needed within the system; deadlock: deciding whether a circular wait has been created within the system preventing any further progress.

All these problem do however share a very important trait: we need to decide whether a certain property holds (e.g., a data object is garbage, an entity is deadlocked, all entities have terminated their execution); the property is stable: if no external event occurs in the system, the property will continue to hold.

In this Chapter we will examine two of these problems in detail, designing efficient solutions for them. We will then attack the task of designing a generic solution to the problem of detecting whether a stable property holds, regardless of the specific nature of the property.

Chapter 9. Continuous Computations.

When we have been discussing computations in distributed environments, we have always considered computations that, once started, terminate within finite time. There are however computations that never terminate. These are, for example, computations needed for the control and maintainance of the environment, and they are ``on" as long as the system is ``on: the protocols composing a distributed operating system, the transaction management protocols in a distributed transaction system, the network service protocols in a data communication network, the object management functions in a distributed object system, etc. Because of this nature, these computations are called continuous computations. Since the computation never ends the efficiency of a continuous computation is measured in terms of either its cost per basic operation it implements, or its cost per basic event triggering its action.

In this Chapter we will examine some basic problems whose solution requires continuous computations: maintaining logical clocks, controlling access to a shared resource or service and maintaining a distributed queue ( distributed mutual exclusion), and detecting and resolving deadlocks.

Some continuous problems are just the (endless) repetition of a terminating problem (plus adjustments); others could be solved in that way but also have unique non-terminating solutions; and others yet do not have any terminating counterpart. In this Chapter we will examine continuous problems of all these types.