CO-DESIGN OF A FAULT-TOLERANT
COMMUNICATION PROTOCOL—A CASE STUDY

by

Reinhard von Hanxleden, Luciano Lavagno, and
Alberto Sangiovanni-Vincentelli

Memorandum No. UCB/ERL M97/13

26 February 1997
CO-DESIGN OF A FAULT-TOLERANT COMMUNICATION PROTOCOL—A CASE STUDY

by

Reinhard von Hanxleden, Luciano Lavagno, and Alberto Sangiovanni-Vincentelli

Memorandum No. UCB/ERL M97/13

26 February 1997

ELECTRONICS RESEARCH LABORATORY

College of Engineering
University of California, Berkeley
94720
Co-Design of a Fault-Tolerant Communication Protocol—A Case Study

Reinhard von Hanxleden
Responsive Systems (F3S/R)
Daimler-Benz AG
vhanx@DBresearch-berlin.de

Luciano Lavagno
Dipartimento di Elettronica
Politecnico di Torino
lavagno@polito.it

Alberto Sangiovanni-Vincentelli
Dept. of EECS
University of California, Berkeley
alberto@eecs.berkeley.edu

Abstract

Hardware/software co-design is a design technology that supports the integrated development of hardware and software components of a system. A special focus of the work described here is the application domain of safety-critical embedded systems. In addition to the raised abstraction level and shortened design time offered by co-design relative to conventional design methods, we are therefore particularly interested in paradigms based on formal models.

This paper gives a brief overview of a co-design case study based on an application typical for safety-critical applications. We compared two ways of designing embedded systems: STATEMATE, a commercial system design tool based on Statecharts, and POLIS/Ptolemy, with ESTEREL as a specification language. We describe our design experiences and give preliminary experimental results.

1. Introduction

Hardware/software co-design is a design methodology—or rather, a collection thereof—that has become increasingly popular in the design of embedded systems. A particular class of embedded system applications are safety critical applications, which are the focus of the work described in this paper. Some requirements of this class of applications are shared with other applications’ requirements, such as short design times or low costs of the final product. Other requirements appear in slightly different guise; e.g., performance is critical, but typically only in that certain deadlines have to be met, beyond which there is no benefit from improving performance further. Requirements that are particularly important whenever safety is an issue are for example simplicity and formality—both critical not only for a proper design itself but also for certification purposes. Finally, there are requirements that are very specific to safety-critical applications, such as the ability to specify fault-containment regions.

As a first step in assessing the state of the field with respect to safety-critical applications, we are performing a case study based on an application typical for this field. In this paper, we present a high-level comparison of the design methodology followed using STATEMATE, a commercial system design tool based on Statecharts, and POLIS/Ptolemy, with ESTEREL as a specification language. We describe our design experiences and give preliminary experimental results.

The POLIS model of computation is based on Co-design Finite-State Machines, i.e. FSMs that communicate asynchronously. The global asynchrony is stressed to be able to mix hardware and software components since they have widely different reaction times and the communication among different tasks does take a finite amount of time. The STATEMATE semantic is based on synchronously communicating FSMs and hence it does not allow to mix hardware and software components. To compare the methodologies we had to restrict the capabilities of POLIS and to explore only two extremes of the hardware-software trade-off: fully software implementation and fully hardware implementation. Nevertheless this experiment allowed us to draw some interesting conclusions about the specification methods used by the two systems as well as their relative power.

2. The application: PROSA

PROSA (Protocol for Safety-Critical Applications) is a reliable communication protocol for safety-critical distributed systems currently developed at Daimler-Benz. Such communication systems are the backbone of applications such as fly-by-wire, steer-by-wire, or brake-by-wire, and naturally not only their performance but also their robustness has to meet particularly high standards. We chose PROSA as the application driving the co-design case study, motivated by the overall relevance of the application for Daimler-Benz, the suitable size and complexity, and the local availability of expertise. From a natural language description of the protocol we derived a simplified version of PROSA as a starting point for the case study.
3. The baseline: STATEMATE

STATEMATE, the "official" implementation of Statecharts [5], is a commercial tool that allows high-level system design and offers synthesis paths to both hardware and software. STATEMATE uses Statecharts to describe the behavior of a component within the system to be specified. In addition, Activitycharts describe which components a system consists of and what the signal and data paths are between the components. Finally, Module Charts describe the structure of the system (as opposed to the functionality—Activitycharts—and behavior—Statecharts).

Figure 1 shows a subset of the total of 15 State- and Activitycharts that describe PROSA. The abstract timing model uses time-outs as provided by STATEMATE. A watchdog simulates the bus traffic and randomly sends frames and veto signals.

3.1. Positive Aspects

Overall the implementation of STATEMATE appears to be quite complete and robust; even the novice can specify and simulate simple applications fairly quickly. During the simulation the current states and transitions are highlighted in the Statecharts with different colors. The animation of input and output signals and conditions with graphical "panels" is simple. One can check properties such as potential deadlock and reachability of certain states. (An example of this within a traffic signal application would be to check whether a state with the simultaneous sub-states "pedestrian signal green" and "car signal green" is reachable.)

3.2. Drawbacks

A weak point of STATEMATE was the archaic and not always consistent graphical user interface—this has somewhat improved with the Motif-based GUI of STATEMATE MAGNUM. Another problem is replicating and instantiating modules, which is difficult at best. It is possible to synthesize C or VHDL; however, due to the so far prevailing inefficiency of the resulting code, this feature seems better suited for just testing or prototyping, rather than actual development. Related issues are the openness and compatibility to other tools, which are limited. Other aspects of the tool that are of potential interest are the possibility to incorporate external C-code and the generation of detailed reports with the aid of a Document Generation Language.

A well-recognized problem that hampers not only efficient automatic code synthesis but also the general clarity of a design is the sheer richness of the Statecharts dialect implemented by STATEMATE. For example, the undisciplined use of event broadcast and Statechart-parallelism at lower levels can make it arbitrarily difficult to understand the flow of causality in a design. History connectors and real-time time-outs can introduce non-determinism that is not statically detectable.

Furthermore, STATEMATE is a powerful tool, but does not come with a methodology. As a consequence, "style guides" have been developed by the user community to guide the designers towards clear specifications—one such guide has also been developed at Daimler-Benz. It would be desirable to get tool-support for this as well, for example by automatically checking whether a design conforms to certain rules.

3.3. Summary of STATEMATE

STATEMATE can be a very useful aid in describing and specifying control-driven applications, if used in a disciplined manner. However, depending on the actual requirements on the end result, it may be better to view STATEMATE as just the beginning of an engineering chain, where for example a separate synthesis step turns the STATEMATE-specification into a quality prototype. STATEMATE is not a full co-design tool in that it does not allow a mixing of hardware and software components—it is either all C or all VHDL. However, it is an interesting candidate as a front-end embedded into a co-design environment.
4. The POLIS System

The POLIS co-design environment is targeted towards real-time control systems composed of software on a microcontroller and semi-custom hardware [3, 4]. The POLIS system translates a formal specification of a reactive real-time system written as an asynchronous interconnection of synchronous modules. Asynchronicity is necessary in order to accommodate heterogeneous implementations from a relatively “neutral” extended FSM specification.

Since the semantics of STATEMATE do not allow to mix hardware with software components, we used POLIS to synthesize two different versions from an ESTEREL specification of PROSA, a pure software version (C) and a pure hardware version (VHDL), and simulated both of them within the POLIS system. Due to the different targets, the semantics of the different versions vary slightly at the low level—which turned out to be rarely a problem in practice, but is still something that an engineer should be aware of when designing a system. As described above, the model of synchronicity postulates reactions to be instantaneous. This hypothesis is fairly expensive to implement in a homogeneous architecture (e.g., all in software or all in hardware), and almost impossible in a heterogeneous architecture. Moreover, it is valid only insofar as the embedded controller can react much faster than the minimum time constant in the environment to all inputs, while it may be often more efficient to check the speed of such reaction times only on an input-by-input basis.

In POLIS, hence, only single FSM transitions are considered atomic operations, while communication (including the manifestation of the results of a transition) takes an a priori non-deterministic time. This non-determinism can cause some problems, especially when a signal propagates along two distinct, reconvergent paths, and those paths can have widely different timings under different implementation choices. While it would theoretically be possible to detect when these delays can critically affect the overall behavior of the system, the POLIS design environment currently provides only means to estimate and simulate this timed behavior. This high-level timed co-simulation can be carried out once a set of choices about the implementation of each FSM, about the scheduling of software FSMs, and about the CPU type and speed has been made.

4.1. The ESTEREL specification language

A designer working with the POLIS co-design system has the choice between a graphical input language, which is basically a Finite State Machine (FSM) editor, and a text-based one, which is ESTEREL [1]. The graphical input language is similar to Statecharts but without the capabilities for hierarchy and parallelism; we therefore chose the ESTEREL front-end.

ESTEREL and STATEMATE use approximately the same underlying model of reactivity and synchronicity; this and the interest in a comparison of the languages suggested to attempt a direct mapping of the STATEMATE specification of PROSA into an ESTEREL version. However, this approach turned out to be sensible only up to a certain point; there are some concepts in both languages that are unique to either and whose direct emulation in the other would result in a rather clumsy specification. Furthermore, finding a satisfying, natural decomposition and description of the requirements on the protocol engine turned out to be fairly demanding. However, this iteration also gave a chance to simplify the existing specification. Overall, the process of understanding and summarizing the existing specification, decomposing it again, and filling in the modules one by one was still quite straightforward. Another nice facet of working with ESTEREL is the availability of an Emacs ESTEREL-mode, which highlights keywords and comments, indents properly, closes blocks, etc. A simulation script of about 1300 lines stepped the system successfully through various scenarios of stations initializing and integrating themselves with and without transmission errors and disconnecting themselves again; a fragment of the script is shown in Figure 2. After about two weeks the specification reached a steady state, consisting of 18 modules with about 1100 lines total.

At first glance, ESTEREL appears to be at a somewhat lower level and “pickier” than STATEMATE. Unlike STATEMATE, which accepts incomplete and even non-deterministic specifications and allows simulation and synthesis with only dynamic checks, an ESTEREL specification has—by default—to be statically proven to be sound (“causal”) before it can be further processed. This approach has to be inherently conservative; that is, certain programs may be rejected even though they would never result in problems at run time.

There is also a temptation to write ESTEREL-programs with a sequential, data-driven, von-Neumann machine model in mind. It does require some re-thinking to produce an ESTEREL program that is truly in the spirit of synchronous languages—and does not look like a C-program. For example, states in ESTEREL are preferably represented as halting points in the program, rather than by explicit variables; this concept is not completely obvious, but quite intuitive once understood.

4.2. The POLIS-Ptolemy interaction

The Ptolemy system is a software environment for simulation and prototyping of heterogeneous systems [2, 6]. It uses object-oriented software technology to model each subsystem in a natural and efficient manner, and it has mecha-
Figure 2. Fragment of an ESTEREL simulation script. Comments are prefixed with "%%", reactions of the system are prefixed with "@".

Figure 3. Ptolemy schematic of PROSA. The "stars" represent different modules, one of which (RECV.VETO) is enlarged on the lower right. There are additional inputs (on the left) and outputs (right).

5. Experimental results

5.1. Software synthesis—Accuracy of size and speed estimates

A critical component of a co-design environment is performance prediction, which can guide the developer in deriving a good partitioning between hardware and software. Furthermore, tight bounds on the worst case execution time are of particular interest to safety-critical applications. To measure the prediction qualities of POLIS, we estimated both code size and speeds and compared these data with the actual results.

The total actual number of clock cycles on a DECstation 5000 executed by the scheduler, the I/O drivers and the user tasks was 1106325. The total actual number of clock cycles executed by the user tasks is 344185. The POLIS timing estimator reported an estimated 345614 clock cycles for the user tasks only. (The Real-Time Operating System synthesized by POLIS has not yet been characterized for the R3000, so no estimates for the other cycles are available.)

Table 1 compares the estimated and actual code speeds and sizes for each task. The speed comparison is based on the 510 simulation cycles that have also been used for functional validation. The estimates are from a dynamic simulation, the actual values are obtained with pixie. The estimation parameters were obtained without compiler optimizations (-O0), so the fairest comparison is between the columns labeled "Estimated avg." and "Actual average -O0". The estimator tries to take into account the cost of some RTOS operations, and that explains part of the discrepancy between the columns. A quantitative comparison (column "Δ") lists the ratio between the differences between actual and estimated timings and the actual timing. This gives an estimate of the error due to ignoring the effects of an aggressive optimizing compiler (the MIPS cc). Such effects are impossible to take into account by using the current estimation technique in POLIS, because it uses only local information. Note, however, that at least for modest effort (-O0), the optimizations decrease the accuracy of the estimator (summarized in the line labeled "Average estimation error"), but are not hampering predictability (given by the variance). In other words, the effect of optimization can be statistically estimated with good accuracy by uniformly dividing by 1.4 all the figures obtained without
### Table 1. Comparison of estimated and actual code speeds and sizes for each task.

<table>
<thead>
<tr>
<th>Task</th>
<th>Number of calls</th>
<th>Code Speed (clock cycles)</th>
<th>Code Size (Bytes)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td>Estimated</td>
<td>Actual</td>
</tr>
<tr>
<td></td>
<td></td>
<td>min</td>
<td>max</td>
</tr>
<tr>
<td>CRC</td>
<td>18</td>
<td>44</td>
<td>90</td>
</tr>
<tr>
<td>INIT</td>
<td>35</td>
<td>48</td>
<td>105</td>
</tr>
<tr>
<td>MODE</td>
<td>88</td>
<td>56</td>
<td>125</td>
</tr>
<tr>
<td>NORMAL_ERROR</td>
<td>46</td>
<td>52</td>
<td>90</td>
</tr>
<tr>
<td>RECV</td>
<td>128</td>
<td>46</td>
<td>97</td>
</tr>
<tr>
<td>RECV-BEGIN_FRAME</td>
<td>513</td>
<td>39</td>
<td>117</td>
</tr>
<tr>
<td>RECV_FRAME</td>
<td>113</td>
<td>79</td>
<td>139</td>
</tr>
<tr>
<td>RECV_VETO</td>
<td>137</td>
<td>49</td>
<td>129</td>
</tr>
<tr>
<td>SEND</td>
<td>43</td>
<td>39</td>
<td>90</td>
</tr>
<tr>
<td>SEND_FRAME</td>
<td>634</td>
<td>71</td>
<td>146</td>
</tr>
<tr>
<td>SEND_VETO</td>
<td>620</td>
<td>55</td>
<td>106</td>
</tr>
<tr>
<td>SLICE</td>
<td>21</td>
<td>45</td>
<td>136</td>
</tr>
<tr>
<td>STATE_VECTOR</td>
<td>31</td>
<td>96</td>
<td>218</td>
</tr>
<tr>
<td>STATION</td>
<td>21</td>
<td>54</td>
<td>75</td>
</tr>
<tr>
<td>SYNCH</td>
<td>31</td>
<td>55</td>
<td>96</td>
</tr>
<tr>
<td>TSC</td>
<td>486</td>
<td>95</td>
<td>160</td>
</tr>
<tr>
<td>WINDOW</td>
<td>466</td>
<td>82</td>
<td>166</td>
</tr>
<tr>
<td>Average estimation error</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Variance</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

With full optimization (\(-\text{O2}\)), predictability suffers as well.

Table 1 also reports a comparison between estimated and actual code size (in bytes) for each task on the MIPS R3000. The estimation parameters were obtained without compiler optimizations (\(-\text{O0}\)), so the fairest comparison is between the first two columns, as for the code speed.

A first conclusion that can be drawn from these figures is that the overhead due to the synthesized RTOS is fairly high. This is mostly due to the fact that the RTOS is optimized for size rather than speed. It requires 19 Kbytes out of a total of 45952 Kbytes for the complete synthesized system on a MIPS R3000.

For the sake of comparison, the total code size (including a simple simulation interface) was 63.3Kbytes for POLIS and 86.2Kbytes for ESTEREL (version 5.0). The total execution times were 866K clock cycles and 242IK clock cycles, on a DECstation 5000 respectively.

### 5.2. Hardware synthesis

An all-hardware synthesis of PROSA resulted in 392 latches and about 88k literals for a plain RTL synthesis. An optimization step had little effect on the number of latches (391), but could significantly decrease the number of literals (6.6k). We are estimating that on an actual FPGA, this translates to about 1200 CLBs (without merging). We are currently in the process of validating these data experimentally.

### 6. Conclusion and outlook

We chose hardware/software co-design as a guiding principle for prototyping a controller implementing PROSA, a communication protocol specifically designed for safety-critical applications. We have completed the specification of a first version of PROSA in ESTEREL; the process of specifying and simulating the protocol with co-design tools has already been very valuable for understanding and refining it. We are now in the process of building a first physical prototype based on the Motorola 68332 in the context of a Brake-by-Wire demonstrator.

Overall it appears that the field of co-design has progressed fairly well in general, in particular as far as process-integration and automation are concerned; it is possible to start at a high level of specification and arrive at an efficient prototype design in a largely automated fashion. For safety critical applications particularly beneficial are easy accessibility and understandability of the specification, offered by a tool such as STATEMATE, and the formality, simplicity, and efficiency provided by ESTEREL/POLIS. What appears to be lacking so far, however, is a combination of these properties. Other aspects of a co-design environment that would be desirable from a safety-critical application's point of view
are for example the ability to specify fault-containment regions, within for example a hardware partition, or the ability to annotate the specification with non-functional requirements, such as a desired degree of reliability.

Acknowledgments

Jürgen Bohne has made the PROSA-application available and has assisted in translating the natural-language specification into a formal model. Slavomir Kupczyk and Carsten Sühl have written the STATEMATE specification of PROSA. Ralph Sasse has been providing his expertise and help with regard to the prototyping hardware and the Brake-by-Wire demonstrator. Throughout the project, members of the POLIS and the ESTEREL groups have been very helpful, in particular Gerard Berry, Xavier Fornari, Harry Hsieh, Attila Jurecska, Claudio Passerone, and Ellen Sentovich.

References