Operating systems (OSs) are among the most sophisticated software systems in widespread use, and among the most expensive and time-consuming to develop and maintain. OS software must also be robust and dependable, since OS failures can result in system crashes that corrupt user data, endanger human lives (cf. embedded systems), or provide open avenues of attack for hackers or even cyber-terrorists.
OSs present their designers with enormous development challenges. On the one hand, many activities inside an OS happen concurrently: caches are flushed periodically; processes and threads are stopped and restarted; interrupts and other signals arrive at random times and must be handled promptly; data can be transferred through multiple channels (memory, DMA, I/O buses). Concurrent processing introduces well-known difficulties into the traditional code-test-debug paradigm, since errors can be difficult to repeat owing to race conditions between concurrent processes. On the other hand, debugging even sequential OS modules poses difficulties, since the OS's closeness to the actual computing hardware requires in situ testing.
It is also important that OS system-call interfaces be well documented so that they may serve as useful design guides for OS implementers and as interface definitions for application developers. Man pages, currently the primary source of documentation for system-call interfaces, are often incomplete, vague, ambiguous, and even incorrect. Another often under-appreciated aspect of OS software is the profusion of different OSs in use, particularly in the embedded-systems arena where OSs such as VxWorks or pSOS or (more often) proprietary ad hoc OSs are deployed.
The above observations highlight the great impact that improved OS development techniques would have on all enterprises that produce or use OS software. If man pages could be guaranteed to be correct and complete; if system calls could be certified to be free from deadlocks and memory leaks; if causes of system crashes could be quickly diagnosed; then the savings to the OS and application-development communities would be enormous. If this could also be accomplished while reducing OS development costs, then the impact is even greater. We refer to this ideal--better OSs at lower cost-- as High-Confidence Operating Systems (HCOS).
In this paper, we present an overview of our work on bringing the HCOS concept to bear on the practice of OS development. Section 2 presents the overall methodology we are pursuing, a central component of which is the Concurrent Class Machines (CCM) modeling formalism. Section 3 describes how we are using CCMS to model system-call man pages. Section 4 discusses how we verify our CCM models against different kinds of requirements. Section 5 concludes with a status report.
In related work, efforts to validate OSs fall into three main categories:
verification techniques [3], compilation techniques [7,4], and external
runtime testing [6]. The references given are a sampling.
The HCOS approach focuses on the formal modeling of OS system calls
and their interfaces, and utilizes newly developed techniques from all
three categories.
In Proc. of EW'02, the 10th ACM SIGOPS European Workshop "Can we
really depend on an OS?", pp. 205-208, September 2002, Saint-Emilion,
France.