We introduce the technique of Software Monitoring with Controllable Overhead (SMCO), which is based on a novel combination of supervisory control theory of discrete-event systems and PID-control theory of discrete-time systems. SMCO controls monitoring overhead by temporarily disabling monitoring of selected events for as short a time as possible under the constraint of a user-supplied target overhead ot. This strategy is optimal in the sense that it allows SMCO to monitor as many events as possible, within the confines of ot. SMCO is a general monitoring technique that can be applied to any system interface or API.
We have applied SMCO to a variety of monitoring problems, including two highlighted in this paper: integer range analysis, which determines upper and lower bounds on integer variable values; and Non-Accessed Period (NAP) detection, which detects stale or underutilized memory allocations. We benchmarked SMCO extensively, using both CPU- and I/O-intensive workloads, which often exhibited highly bursty behavior. We demonstrate that SMCO successfully controls overhead across a wide range of target-overhead levels; its accuracy monotonically increases with the target overhead; and it can be configured to distribute monitoring overhead fairly across multiple instrumentation points.
To appear in the International Journal on Software Tools for Technology Transfer (STTT), 2010, Springer.
*This work was supported by the NSF Faculty Early Career
Development Award CCR01-33583 and the AFOSR Award FA-0550-09-1-0481.