Printer friendly version
Disclaimer: None of the information provided on this page should be
construed as legal advice. SLAng is under development and should not yet be
used for real SLAs.
Requirements
SLAng is designed so that SLAs expressed in the language meet two carefully
chosen sets of requirements:
Generic requirements: apply to SLAs for any domain. These
include:
- Understandability: agreements of any kind require a mutual
understanding of terms between the involved parties. Such an understanding can
be difficult to establish for SLAs. The qualities of a service must be defined
in such a way that it is clear how to measure them. Constraints on the
behaviours of the parties that impact on QoS must be clearly defined, so that
the cause of SLA violation can be unambiguously attributed to the party at
fault.
- Precision: for any given behaviour of the system it should
be unambiguously clear what the consequences are in terms of penalty payments
defined in the SLA.
- Practicality: SLAs can be useful in a number of different
contexts. They are part of contracts, and must therefore be understandable in
legal contexts. They must also inform monitoring technologies, and conformance
to SLAs must be checked. They may be used in technologies that adapt
application deployments automatically to optimise performance or resource usage.
Therefore SLAs should be expressed formally, but using technologies which
make them easy for humans to understand and manipulate.
- Monitorability: it is unwise to accept an undertaking from
another party to fulfil a responsibility or pay a penalty if you will not
be able to tell whether the party has fulfilled the responsibility. Therefore,
SLAs should only place conditions on events that can be monitored by both
parties, or reported on by trusted third-parties.
ASP requirements: determine the set of constraints that it
is appropriate to include in an SLA for application-service provision. From
a theoretical point of view, it is helpful to consider the requirements that
the various parties will wish to apply to the events occurring in the scenario.
In the above diagram events corresponding to request dispatch, request
receipt at the service, response dispatch and response receipt by the client
are labelled x, y, z and w. These events
can be considered to have two major attributes: the time that they occur and
the content of the message being sent or delivered.
The client's requirement in this case is on w relative to
x: w must occur within some time-limit of x and
must be the correct response given the content of x and the state
of the service when x occurred. These are latency and reliability
constraints.
The service requires that y does not occur too often, as requests
may potentially overwhelm the service. This is a throughput constraint.
The ISP requires that x and z do not occur too frequently
as these may overload the network. These are also throughput constraints.
A language for ASP SLAs must somehow allow SLAs to be agreed to allow the
parties to insure these requirements, by associating penalty payments with
their violation. Multiple SLAs may be required.
SLAng design points
Designed for precision and usability:
The SLAng semantics are defined by associating a model of the language with a
model of service usage. The presence of SLA elements imposes constraints on the
possible behaviours that can be observed in the model. The model is defined
using EMOF, a language similar to UML class diagrams. This makes it easy for
IT professionals who need to reason about QoS to see how the model corresponds
to their systems. The constraints are described using the Object Constraint
Language (OCL), which has a well defined semantic of its own, meaning that the
constraints are unambiguous.
Designed for practicality:
The adoption of EMOF as the language for the meta-modelling of SLAng confers
a number of practical advantages. EMOF is a standard of the
OMG, and is intended for the design of
meta-data repositories. The OMG also define a pair of specifications
for meta-data interchange which may be used for languages defined using EMOF.
The first, XMI, is based on XML, meaning that SLAng SLAs can be exchanged in
a manner compatible with existing XML-based web-services standards, such as
WSDL and SOAP. The second, the Human-Usable Textual Notation (HUTN) is more
suitable for human use. The UCL MDA tools, used to define SLAng and implement
the Eclipse plug-in for the language, support both of these standards.
Designed for monitorability:
Central to the design of SLAng is the idea of monitorability - the idea that
SLAs should only place constraints on events that are visible to both parties.
If both parties can monitor the SLA, then they can determine whether the other
party is cheating when violations and penalty payments are determined.
The challenge in this context of application services was to design a
language which can express SLAs insuring the requirements of the parties
described above in a monitorable fashion, and in a safe fashion, such that
parties only ensure requirements that they themselves can guarantee or have
insured. In this scenario, it is clear that the client can monitor events
x and w, the service provider can monitor y and
z, and the ISP can monitor all events.
A theoretical analysis of the ASP scenario reveals that for latency and
other requirements, only a single system of SLAs is possible in which all SLAs
are both monitorable and safe. Displayed below, this is when the ISP offers
a guarantee to the client insuring the performance of the electronic service
(an ES SLA) as it is delivered to the client's interface. The
service provider then enters into a second SLA with the ISP insuring the
performance of the service as it is delivered to the network.
Note that this configuration of SLAs is monitorable, because both SLAs
only constrain events that are visible to the parties to those agreements. The
configuration is also safe. The service only insures conditions that they can
guarantee directly. The ISP insures the total round-trip QoS for the client.
However, if a delay or error is introduced by the service, then the service
provider will compensate the ISP, who can then compensate the client.
Designed for the ASP scenario:
The language is currently capable of expressing the following types of
constraint, implied by the ASP scenario:
- Throughput constraints
Throughput constraints are required to satisfy the requirements of the
network provider and the ISP, that they are not overwhelmed by request or
response traffic. Throughput constraints are straightforward to define in
a monitorable fashion, as we are only concerned with the arrival rate of
requests or responses.
Precisely defining the throughput constraint requires some consideration
however. Average throughput over a long period may not be a good indicator
of peak load. Furthermore, constraining the interval between any two requests
may place unnecessary constraints on the client's implementation. Most
distributed architectures have a notion of request buffering or message queuing.
Therefore it makes sense to define throughput in relationship to the degree
of request or response concurrency that the service can accomodate.
To achieve this, SLAng specifies throughput constraints by specifying the
size of a sliding window, and specifying the maximum concurrency of requests
within this period.
Note that the sliding window notion gives rise to the useful concept of
a period of over-utilisation. Within such a period, any sliding window
would violate the throughput constraint. SLAng allows the specification of
penalties related to throughput violations to vary with the length of the
period of over-utilisation.
- Timeliness and reliability constraints
In SLAng, timeliness and reliability constraints are treated uniformly.
Overdue responses from the service may have no business value to the client, so
may be treated as a failure. Similarly, the business value of an incorrect
response must be reckoned to be zero.
Reliability constraints have similar requirements to throughput constraints.
Reliability should not be reckoned over too long a period, or unacceptably
long intervals where the service is not accessible may occur, but appear to
be balanced by period of frantic correcting behaviour. Similarly, in a given
timescale, some failures may be tolerable. Therefore in SLAng we define
reliability constraints by specifying the maximum number of failures that
may be observed within a sliding time-window.
Again, the sliding time-window gives rise to the notion of a period of
poor-performance, in this case unreliability, which may be used to vary the
application of penalties. Note that the parameters of a reliability clause
need to consider any throughput constraints that apply concurrently, as these
will constrain the maximum number of requests that may be made in total within
the sliding window.
- Availability constraints
The common notion of availability is related to the state of the service. It
may either be ready to be used, or not. However, in the ASP scenario the client
can only observe the state of the service by submitting requests. If these
fail then the service will be reckoned unreliable, so this contingency is
already handled by SLAng.
On the other hand, a service may be clearly broken, in that it never
responds correctly to a particular request. Under the circumstances, the
client will not wish to have to continuously poll the service in order to
become liable to receive penalties, and become aware when the service is
ultimately fixed. Furthermore, the service provider would not wish the client
to receive penalties related to errors which were not promptly reported.
Therefore, we have assumed that an additional channel of communication exists
between the client and the provider of the SLA besides the use of the service.
SLAng includes availability constraints which are related to the exchange
of bug reports and bug fix reports via this channel.
Availability penalties can be used to provide an incentive for the client
to report bugs. They also provide an incentive for the service provider to
fix them in a timely fashion.
- Data currency and recovery constraints
An aspect of reliability is that the data manipulated by the service should
be correct and uncorrupted. SLAng includes data currency and data recovery
constraints to ensure this.
A data currency constraint applies to an operation of the service, and states
for some data types manipulated by that operation, that the data should be
the most recent data, older than some threshold age. The threshold age
accomodates time taken to complete transactions, and can also be used to
specify operations for accessing historical data. Data currency constraints
are a type of reliability constraint, and are defined in the same fashion, but
with reference to a different mode of failure of the service.
A data recovery penalty may be invoked if data is lost by the service. If
the service loses some data, it may never again be able to satisfy some data
currency constraints, and in the future its output will be regarded as corrupt.
Therefore, the service provider must report that the data has been lost, and
give an age of data that has been recovered and is now regarded as current.
Since data loss is never wholly acceptable, this incurs a penalty, which may
be varied according to the reported age of the data recovered.
Equally important to determining the type of constraints included in an SLA
is determining when they should apply. SLAng permits the varying of SLA
constraints according to:
- Absolute time: conditions may be associated with schedules determining when
they should apply.
- The state of some mutually monitorable system: conditions may be associated
with defined states of some system. The current state of the system must be
mutually monitorable by the parties to the SLA in order to preserve the
monitorability of the SLA overall. The state will be mutually monitorable if
the initial state of the system when the SLA begins can be agreed, and events
indicative of all changes of state are mutually monitorable by the parties.
|