INTRODUCTION TO SLAng

A language for ASP SLAs

/slang.php

Copyright UCL 2006

Disclaimer: None of the information provided on this page should be construed as legal advice. SLAng is under development and should not yet be used for real SLAs.

Requirements

SLAng is designed so that SLAs expressed in the language meet two carefully chosen sets of requirements:

Generic requirements: apply to SLAs for any domain. These include:

  • Understandability: agreements of any kind require a mutual understanding of terms between the involved parties. Such an understanding can be difficult to establish for SLAs. The qualities of a service must be defined in such a way that it is clear how to measure them. Constraints on the behaviours of the parties that impact on QoS must be clearly defined, so that the cause of SLA violation can be unambiguously attributed to the party at fault.
  • Precision: for any given behaviour of the system it should be unambiguously clear what the consequences are in terms of penalty payments defined in the SLA.
  • Practicality: SLAs can be useful in a number of different contexts. They are part of contracts, and must therefore be understandable in legal contexts. They must also inform monitoring technologies, and conformance to SLAs must be checked. They may be used in technologies that adapt application deployments automatically to optimise performance or resource usage. Therefore SLAs should be expressed formally, but using technologies which make them easy for humans to understand and manipulate.
  • Monitorability: it is unwise to accept an undertaking from another party to fulfil a responsibility or pay a penalty if you will not be able to tell whether the party has fulfilled the responsibility. Therefore, SLAs should only place conditions on events that can be monitored by both parties, or reported on by trusted third-parties.

ASP requirements: determine the set of constraints that it is appropriate to include in an SLA for application-service provision. From a theoretical point of view, it is helpful to consider the requirements that the various parties will wish to apply to the events occurring in the scenario.

In the above diagram events corresponding to request dispatch, request receipt at the service, response dispatch and response receipt by the client are labelled x, y, z and w. These events can be considered to have two major attributes: the time that they occur and the content of the message being sent or delivered.

The client's requirement in this case is on w relative to x: w must occur within some time-limit of x and must be the correct response given the content of x and the state of the service when x occurred. These are latency and reliability constraints.

The service requires that y does not occur too often, as requests may potentially overwhelm the service. This is a throughput constraint.

The ISP requires that x and z do not occur too frequently as these may overload the network. These are also throughput constraints.

A language for ASP SLAs must somehow allow SLAs to be agreed to allow the parties to insure these requirements, by associating penalty payments with their violation. Multiple SLAs may be required.

SLAng design points

Designed for precision and usability:

The SLAng semantics are defined by associating a model of the language with a model of service usage. The presence of SLA elements imposes constraints on the possible behaviours that can be observed in the model. The model is defined using EMOF, a language similar to UML class diagrams. This makes it easy for IT professionals who need to reason about QoS to see how the model corresponds to their systems. The constraints are described using the Object Constraint Language (OCL), which has a well defined semantic of its own, meaning that the constraints are unambiguous.

Designed for practicality:

The adoption of EMOF as the language for the meta-modelling of SLAng confers a number of practical advantages. EMOF is a standard of the OMG, and is intended for the design of meta-data repositories. The OMG also define a pair of specifications for meta-data interchange which may be used for languages defined using EMOF. The first, XMI, is based on XML, meaning that SLAng SLAs can be exchanged in a manner compatible with existing XML-based web-services standards, such as WSDL and SOAP. The second, the Human-Usable Textual Notation (HUTN) is more suitable for human use. The UCL MDA tools, used to define SLAng and implement the Eclipse plug-in for the language, support both of these standards.

Designed for monitorability:

Central to the design of SLAng is the idea of monitorability - the idea that SLAs should only place constraints on events that are visible to both parties. If both parties can monitor the SLA, then they can determine whether the other party is cheating when violations and penalty payments are determined.

The challenge in this context of application services was to design a language which can express SLAs insuring the requirements of the parties described above in a monitorable fashion, and in a safe fashion, such that parties only ensure requirements that they themselves can guarantee or have insured. In this scenario, it is clear that the client can monitor events x and w, the service provider can monitor y and z, and the ISP can monitor all events.

A theoretical analysis of the ASP scenario reveals that for latency and other requirements, only a single system of SLAs is possible in which all SLAs are both monitorable and safe. Displayed below, this is when the ISP offers a guarantee to the client insuring the performance of the electronic service (an ES SLA) as it is delivered to the client's interface. The service provider then enters into a second SLA with the ISP insuring the performance of the service as it is delivered to the network.

Note that this configuration of SLAs is monitorable, because both SLAs only constrain events that are visible to the parties to those agreements. The configuration is also safe. The service only insures conditions that they can guarantee directly. The ISP insures the total round-trip QoS for the client. However, if a delay or error is introduced by the service, then the service provider will compensate the ISP, who can then compensate the client.

Designed for the ASP scenario:

The language is currently capable of expressing the following types of constraint, implied by the ASP scenario:

  • Throughput constraints

    Throughput constraints are required to satisfy the requirements of the network provider and the ISP, that they are not overwhelmed by request or response traffic. Throughput constraints are straightforward to define in a monitorable fashion, as we are only concerned with the arrival rate of requests or responses.

    Precisely defining the throughput constraint requires some consideration however. Average throughput over a long period may not be a good indicator of peak load. Furthermore, constraining the interval between any two requests may place unnecessary constraints on the client's implementation. Most distributed architectures have a notion of request buffering or message queuing. Therefore it makes sense to define throughput in relationship to the degree of request or response concurrency that the service can accomodate.

    To achieve this, SLAng specifies throughput constraints by specifying the size of a sliding window, and specifying the maximum concurrency of requests within this period.

    Note that the sliding window notion gives rise to the useful concept of a period of over-utilisation. Within such a period, any sliding window would violate the throughput constraint. SLAng allows the specification of penalties related to throughput violations to vary with the length of the period of over-utilisation.

  • Timeliness and reliability constraints

    In SLAng, timeliness and reliability constraints are treated uniformly. Overdue responses from the service may have no business value to the client, so may be treated as a failure. Similarly, the business value of an incorrect response must be reckoned to be zero.

    Reliability constraints have similar requirements to throughput constraints. Reliability should not be reckoned over too long a period, or unacceptably long intervals where the service is not accessible may occur, but appear to be balanced by period of frantic correcting behaviour. Similarly, in a given timescale, some failures may be tolerable. Therefore in SLAng we define reliability constraints by specifying the maximum number of failures that may be observed within a sliding time-window.

    Again, the sliding time-window gives rise to the notion of a period of poor-performance, in this case unreliability, which may be used to vary the application of penalties. Note that the parameters of a reliability clause need to consider any throughput constraints that apply concurrently, as these will constrain the maximum number of requests that may be made in total within the sliding window.

  • Availability constraints

    The common notion of availability is related to the state of the service. It may either be ready to be used, or not. However, in the ASP scenario the client can only observe the state of the service by submitting requests. If these fail then the service will be reckoned unreliable, so this contingency is already handled by SLAng.

    On the other hand, a service may be clearly broken, in that it never responds correctly to a particular request. Under the circumstances, the client will not wish to have to continuously poll the service in order to become liable to receive penalties, and become aware when the service is ultimately fixed. Furthermore, the service provider would not wish the client to receive penalties related to errors which were not promptly reported. Therefore, we have assumed that an additional channel of communication exists between the client and the provider of the SLA besides the use of the service. SLAng includes availability constraints which are related to the exchange of bug reports and bug fix reports via this channel.

    Availability penalties can be used to provide an incentive for the client to report bugs. They also provide an incentive for the service provider to fix them in a timely fashion.

  • Data currency and recovery constraints

    An aspect of reliability is that the data manipulated by the service should be correct and uncorrupted. SLAng includes data currency and data recovery constraints to ensure this.

    A data currency constraint applies to an operation of the service, and states for some data types manipulated by that operation, that the data should be the most recent data, older than some threshold age. The threshold age accomodates time taken to complete transactions, and can also be used to specify operations for accessing historical data. Data currency constraints are a type of reliability constraint, and are defined in the same fashion, but with reference to a different mode of failure of the service.

    A data recovery penalty may be invoked if data is lost by the service. If the service loses some data, it may never again be able to satisfy some data currency constraints, and in the future its output will be regarded as corrupt. Therefore, the service provider must report that the data has been lost, and give an age of data that has been recovered and is now regarded as current. Since data loss is never wholly acceptable, this incurs a penalty, which may be varied according to the reported age of the data recovered.

Equally important to determining the type of constraints included in an SLA is determining when they should apply. SLAng permits the varying of SLA constraints according to:

  • Absolute time: conditions may be associated with schedules determining when they should apply.
  • The state of some mutually monitorable system: conditions may be associated with defined states of some system. The current state of the system must be mutually monitorable by the parties to the SLA in order to preserve the monitorability of the SLA overall. The state will be mutually monitorable if the initial state of the system when the SLA begins can be agreed, and events indicative of all changes of state are mutually monitorable by the parties.