Quality of service is a major requirement for telecom customers. Service providers want to ensure that their networks are able to meet|the QoS levels sold as part of the service level agreement (SLA), and they want the ability to anticipate problems in the supporting infrastructure that could lead to non-compliance.
To accomplish this, their systems must analyze information from a variety of sources to detect service degradation and also report the information in real time to the staff and other OSS, so that actions can be taken to restore the level of service, inform the customer and calculate any discounts that may apply.
In addition, providers must measure and optimize services to minimize network costs and increase return on investment. This is a challenge because of the dynamics of a growing customer base, changing technology and evolving service offerings.
Service providers usually achieve the first objective—enforcing the SLA—by batch processing and aggregating the data into reports. The collection of information can occur in real time, depending on how the system extracts fault and performance data from the service components, but the reports are usually generated in batch mode.
The second objective—anticipating problems that can lead to non-compliance with the SLA—requires real-time processing so that corrective action can immediately be taken to avoid the SLA violations and their associated contractual penalties, or at least to limit the consequences should an SLA violation occur.
Numerous components—physical or logical entities that help to deliver a service—may participate in the service delivery. They include the transmission layer (DWDM, PDH, SDH, SONET), the broadband layer (ATM, Frame Relay), the radio layer (GSM, GPRS, UMTS), the IP layer (MPLS), the access layer (ADSL), the cross-technology layers (WAP), the servers (DNS, RADIUS, mail, video, SMS, Web, LDAP), the applications (database, OSS software, trouble ticketing systems, help desk systems, proprietary applications) and service provider business processes.
Even this list is not exhaustive, and delivering a particular type of service may involve a combination or a subset of these components. But it is important to note that these components interact with one another to deliver the end customer services, and that one failure in one component may have an impact on a set of these services.
Service providers today have tools that allow them to manage their service components. However, the challenge is to aggregate the information these tools deliver with some degree of automation and real-time features.
Today’s State of the Art
Service providers need service quality management, even without SLAs, simply to monitor the components that deliver the services. This scenario includes cases where the components of provisioned services are very diverse in terms of technologies, service platforms, applications, servers and business processes, and where managing systems exist but are technology-specific. It also involves instances in which the system can extract a lot of information but the information does not reflect the end-to-end service reality and is too technical for the end user to understand. In many cases, providers concentrate on the highest revenue-generating customers but do not accurately measure service degradation trends, leading to difficulties in planning resources.
Today operators use a variety of technologies and tools for managing and controlling the quality of service delivered by their networks. These tools can be classified by management domain and include alarm management, traffic and performance management, server and application management, and active probes.
Alarm Management
Alarm management products collect alarms from managed elements or from subnetwork managers.
Some of these platforms allow the management of abstract objects like services and propagate the alarm impact from the managed physical or logical entities to the service objects. For example, it is possible to model an entire SONET network and to define propagation rules so that when equipment fails (for example a port), the link going through the port is down, and all the unprotected trails using the link are down.
With alarm management, it is possible to signal that a service is affected when one or more of its major components are faulty. This is certainly valuable information to the network operations center. However, the mere absence of faulty components may not accurately indicate the entire service quality. For example, it is possible to have an ATM switch up and running, but some of the permanent virtual circuits (PVC) may not conform to the QoS that has been sold to the customer. Alarm management cannot itself derive such information. Also, if a given service involves an application like voice or video transfer that uses this ATM PVC, it will not be possible to detect service degradation just by gathering fault information.
Most service providers already have an alarm management system, but they need to make sure this system is connected to the systems that monitor SLA performance and QoS. The real integration issue is to maintain the consistency and the relations between the objects managed by the alarm management platform on one side, and the service objects managed on the other side.
For example, an operator creates a new ATM path between two routers and defines IP services that use this path. In this case the operator must also define a relation between the ATM path and the services, so that trouble occurring on the path will trigger a warning about the impacted services.
Traffic and Performance Management
Performance management is the ability to understand and analyze the historical, current and future performance of the network infrastructure and other systems. The main performance management activities include performance monitoring, data collection, reporting and trending.
Traffic and performance management applications give a good view of, and sometimes some control over, the behavior of the components they oversee. These applications are usually used for controlling network performance. For example, performance management of the ATM layer will extract from the network the parameters that will allow the operator to evaluate whether the network respects the quality of service that has been sold to the customer.
When a service involves the combination of ATM and other components such as applications, the network performance alone cannot reflect the exact state of the service. For example, if a constant bit rate ATM PVC is used for video transmission, the video server itself will also influence the quality of the delivered video yet may provide no measurable performance information. In such cases alarm management is also required to deliver a global view. In this example a log file monitor might be used to report alarms from the video server.
When analyzing only traffic and performance management of the network, operators cannot evaluate the SLA compliance of a high-value service that uses only the analyzed layer for transport. When proposing a video-on-demand service on top of ATM or IP, for example, even if the network health is good, the global status of the service will also depend on the video server behavior.
The traffic and performance management must be one of the inputs used by the service management system to analyze the delivered QoS level.
Just as with alarm management, the real issue is to maintain the consistency and the relationship between the objects that the performance platform manages on one side, and the service objects that the service management platform directs on the other side.
As an example, when a performance platform is configured to collect two router parameters, these parameters must be defined as key performance indicators in the service management platform, and the relation between the objects in the two models must be maintained.
Server and Application Management
Application management is how the service provider controls applications that are part of communications services. It includes the ability to keep the applications up all the time, monitor them for failures or performance problems, and tune them to maintain adequate performance for end users.
Application management focuses on the real-time control of servers and applications. The applications can provide specific functions such as location-based services or more generic ones such as databases. Controlling the servers can be based on various system parameters such as CPU load and free disk availability.
Monitoring a server and its hosted applications is mandatory in the analysis of the QoS that the customer receives, but it is clearly not enough. If the server is up and running but the network is down or degraded, the service is not delivered, and using application management alone often affords no way to signal that failure.
The main technical issue in this case is again to maintain the consistency between objects managed by various systems.
Active Probes
As enhanced QoS capabilities make the Internet and other network architectures more complex, this richer transport and service infrastructure requires greater monitoring capabilities. ISPs offering enhanced transport, content hosting and differentiated hosting services, as well as their customers, demand methods to monitor the quality of the services they buy.
One such method involves active probes—devices or embedded software for measuring the end-to-end path or service from the user’s perspective.
If a probe is put somewhere in the network, it will give significant information on the end-to-end quality of the service, from the service access point to the server.
Yet if the quality is bad, there is no way to know if it’s the network’s fault or the server’s. If the quality is good, it means the network from the service access point to the server is good, and the server itself is good; but that still doesn’t demonstrate the quality is good from the other service access points of your network. Effectively a probe is only a small sample of the user experience.
For example, an end user with a handset is connected in New York on a mobile network to an IP service. His connection is bad, yet the service provider has a probe testing the server in Houston, and the performance parameters from this probe are good. How can the operator explain the customer complaint? How does the service provider learn of the trouble in its service quality?
So with probes alone, the only thing that can be verified is that the QoS is good from one probe to a server.
Bringing It All Together
To integrate these various software components, a service management system must support flexible and open interfaces on so-called “northbound” and “southbound” interfaces. The term “northbound” represents an upward interface for processed data or data loading with other OSS and BSS applications, while “southbound” represents interfaces to the sources of service quality data.
Northbound interfaces allow communication with CRM and billing. Southbound interfaces allow parameters to be collected from the systems described earlier.
A system can use standard industry interfaces like CORBA, but doing so implies legacy applications that understand CORBA. Today, integrations based on EAI bus technology are more flexible to interface with all types of applications. Using XML as the interface language embedded in the EAI messages could also be a key differentiator.
The successful service quality management application must encompass a variety of domains, and all aspects (network and IT) of information must be integrated so that any component that relates to the service health is taken into account, measured and analyzed in real time.
The application must reflect a true service view that is oriented toward the end user. It must aggregate and calculate information from all sources to measure and prove service health, and provide alerts when SLAs are violated. Finally, it must allow services to be monitored from an SLA standpoint, but also help to optimize the service infrastructure.
Didier Camous, Compaq TeMIP consultant, has worked on the design and delivery of network and service management solutions since 1996. Today, he is responsible for Compaq’s OSS/TeMIP technical pre-sales team. He can be reached at didier.camous@compaq.com.
Similar Articles
- Security in Network and Element Management Systems: Genband, Motorola and L-3 Communications Style
- MWC On Management
- 'Intelligent Network' Creates New Challenges, Opportunities
- WeDo Snares Risk Management Patent
- Telecom Merger Juggling Act: How to Convert the Back Office and Keep Customers and Investors Happy at the Same Time