Core of IT: Processes: Arbitration

Processes

Introduction
Races/Exclusive Use
Arbitration
Synchronization
Deadlocks
Time Sharing

Authors:

Halleh Fatheisian
Eric Rosenberger

Arbitration

The Arbitration Problem

Driving a car is something that millions of people do every day, often without even thinking about it. And, like any routine, most people are able to establish a level of comfort and skill over time that allows them to drive largely on "auto-pilot". However, driving is actually a very complex process, requiring critical decisions to be made with nearly every second that passes. Not only are there physical obstacles to avoid, but at any given moment there can be dozens of other drivers sharing the road nearby, none aware of each others' intentions. And, while there are traffic laws to help ease the task of anticipating their moves, there are some situations requiring decisions that confound even the most experienced drivers.

Take, for example, the four-way stop. By enforcing serial access to an intersection, it allows cars to travel through in several directions without crashing into one another. Normally, drivers at a four-way stop decide who can enter the intersection first on the basis of who arrived first. However, there is always the possibility that a couple of drivers will arrive simultaneously, or at least at close enough intervals so that no one is able to discern their actual order. The law addresses this situation also, by granting right of way first to the rightmost driver in the group. But the law does not address the situation where drivers arrive from all four directions simultaneously, ruling out either given method of arbitration.

So what happens then? Often, nothing at all. The cars sit motionless, their engines idle. Each driver is faced with the decision of driving into the intersection first or waiting for one of the other drivers to drive into the intersection first. Picking the former could prove disastrous if any of the other drivers did the same -- picking the latter could force an eternal wait. Neither option is desirable, and the drivers quickly become edgy and uncomfortable. There may be false starts, with cars lurching forward only to be stopped at their drivers' realizations that the other cars have also lurched. There may be honks and headlights flashing as the drivers attempt to goad each other into going first. But no one is willing to take the plunge and claim the right of way. As time passes, and impatient looks become more intense, it is inevitable that someone will eventually decide to proceed -- however it is impossible to know how long this will take, and impossible to know who will be the one, until it happens.

This problem, known as arbitration failure, has been explored by philosophers and scientists alike for hundreds of years. It stems from the fact that, in order to make a decision between different possibilities, it is necessary for one of them to be more desirable than the others. Without this means of singling one out, choice is impossible, and only the inherent chaos in the universe can ever have a chance of breaking the tie -- and time is often on the side of the chaos. It might be tempting to categorize this as a problem with human nature or a breakdown of human reasoning. In the four way stop example, it appears to be just that. However, it is actually a fundamental principle of nature and can be observed even within the depths of electronic machinery.

A metastable state can be likened to a ball balanced atop a sharp point.

In fact, electronics are perhaps in a worse position with regard to arbitration failure than humans. Arbiter circuits, which are designed to ensure that multiple incoming signals are processed one at a time, can become "confused" when more than one signal arrives simultaneously. Lacking reasoning, such circuits do not have any alternatives to consider when picking which signal should be processed first. Thus, they can easily become trapped in an undefined state, failing to produce a result until noise or other instabilities in the circuit cause one signal to overcome the others. Known as a metastable state, this condition can be likened to a ball balanced atop a sharp point, the direction in which it will ultimately fall unknown until it is finally tipped by a passing breeze.

Arbiter circuits subject to this problem can be found in just about any device that processes signals. Take, for example, the common building elevator, which has a controller attached to call buttons on several different floors. Normally, the controller records the order in which various buttons are pressed, and sends the elevator to the corresponding floors in the same order. However, what if the elevator were idle on the fifth floor, and people on the fourth and sixth floors were to simultaneously call it? Clearly the controller would have to pick one of the two floors to service first -- however, the conflicting simultaneous signals would force the controller into a metastable state, leaving the elevator suspended indefinitely and the people stranded until the controller reached its decision.

Fortunately, the probability of this happening with an elevator is small, just as four cars do not often descend upon an intersection at once. However, modern computers, which also contain arbiter circuits, are subjected to a nearly continuous barrage of incoming signals to process -- keystrokes, mouse movements, and network transmissions, to name a few. All of these signals must be dealt with one at a time in turn. Furthermore, there can be different devices within a single computer attempting to read from or write to the system memory at once, including the main processor (or processors), disk controllers, video cards, and so on. The signals from all of these devices must be serialized to maintain the integrity of the memory. Overall, the number of opportunities for arbitration failure in a computer is much higher than in a simple case such as the elevator controller.

If operations were to take much longer than normal, the processor might not allow enough time.

The metastable state induced by arbitration failure can be much more problematic in a computer as well. A computer's processor synchronizes its internal circuitry using a clock, and thus the timing of certain operations such as reading from memory is critical. If one of these operations were to take much longer than normal, the processor might not allow enough time for it to finish, assuming that results were ready even if they were not. And, if a critical part of the computer, such as the memory, were to enter a metastable state for an indefinite period of time as a result of arbitration failure, it is quite possible that this might happen, leading to erratic operation of the computer and corruption of data. Given the likelihood of arbitration failure in a computer, this possibility is unacceptable -- for reliable computing, a solution to this arbitration problem is a necessity.

Arbiters

It would seem that the semaphore, as described in an earlier section, should be subject to the arbitration problem. By definition, it must prevent access to a shared resource by more than one processor at a time. Therefore, processors using it for this purpose must have some means of determining the order in which they will gain control if a conflict arises. Normally, processors simply form a queue based upon their arrival order -- but, just like the drivers at the four-way stop, it is not always possible for any of the processors involved in a conflict to determine which processor came first. Yet, well-designed semaphore algorithms are able to consistently arbitrate, seemingly without being subject to the occasional indefinite delay caused by a metastable state in the system. How is this possible?

Actually, semaphore algorithms such as Dijkstra's rely on the fact that the memory circuitry in computers is designed in such a way that only one processor can be reading or writing data at once. Without this guarantee, there would be no completely reliable method of arbitration purely through software. Of course, in order for memory circuitry to spare semaphores from the arbitration problem, it then must be able to handle the potential access conflicts between processors. In fact, conflicts are even more likely at the hardware level, because even if multiple processors are not contending for access to the same object in memory, simultaneous signals could still collide and result in data corruption. Furthermore, such collisions can happen not only between processors, but between any devices in a computer that directly read or write memory, or, for that matter, directly access any other shared resource.

Arbitration failure in the hardware case means more than just a delay.

To maintain order in these situations, arbiter circuits are used with critical shared resources such as memory. Like semaphores, they guarantee serial access by allowing only one request to pass through them at a time -- and, like semaphores, they are prone to arbitration failure in the event that multiple request signals arrive at once. Arbitration failure in the hardware case, however, means more than just a delay while a metastable state is resolved. Hardware components such as processors expect operations like memory reads and writes to be finished in a certain amount of time, based on their internal clocks. Any unexpected delay while an arbiter circuit attempts to decide which request to handle first is not accounted for, and assumptions can be made that data are ready when they actually are not. So, even though the arbiter would decide eventually, and serial access would still be maintained, the computer might become unstable anyway. However, in modern computers this does not happen. How is this possible?

The answer comes from research done in the earlier days of computers. Before the intricacies of the arbitration problem were fully understood, circuits would occasionally become unstable after their arbiters entered metastable states. This confounded many, who expected that, given the speed at which requests could be processed, arbitration failure as a result of colliding signals causing a prolonged metastable state would be almost statistically impossible. However, studies in the 1970s found, both mathematically and experimentally, that the probability was actually quite high. More collisions and longer metastable states were occurring than predicted, resulting in more frequent failures.

If a metastable state were to occur, the clock would remain deactivated until the situation was resolved.

In the late 1960s, David Wheeler, an engineer at the Computing Laboratory of the University of Cambridge, hit upon an idea for an arbiter that would ultimately save many computers from the arbitration problem. Recognizing that there was very little way that metastable states could be prevented in conventional computer hardware, he instead focused on preventing the potentially long delays they caused from creating instability. First used in the Cambridge CAP computer in 1974, his invention was like a normal arbiter circuit, except that it was connected directly to the computer's system clock. Whenever it was in the process of arbitrating incoming requests, it sent a signal to deactivate the clock. When its decision had been reached, it reactivated the clock. Thus, if a metastable state were to occur, the clock would remain deactivated until the situation was resolved -- and, with a deactivated clock, other parts of the computer could not continue blindly and become unstable because of timing assumptions.

Wheeler's arbiter does not altogether solve arbitration problems, because there are still some situations in which a computer cannot simply have its clock stopped indefinitely. For instance, real-time systems responsible for operations requiring precise timing could easily fail if suspended for any length of time. In these cases, some amount of compromise is necessary. Rather than a full guarantee that no arbitration failure will occur, these systems must accept some possibility that it will in exchange for freedom from delay. However, Wheeler's arbiter solves the arbitration problem sufficiently for most computers. Today it serves as a model foundation for concurrent computer processing.

Next section: Synchronization