Amidst all the promises around real-time services, IMS, IPTV and next-generation networks, what’s typically missing is a realistic look at what needs to be invented to make it all happen. The idea of a real-time service environment is appealing, but it’s also a significant stretch for most operators today. As executive management and marketing voice their plans and create aggressive deadlines, it’s up to the engineers to make the grand vision of a real-time service environment a reality. Currently, engineers are re-examining their IT environments and finding that provisioning will turn inside out. Service assurance will become far more complex. Data and transaction volumes will escalate exponentially. Today service providers are dealing with just the first layer of new issues, and no one is quite sure how many layers there are to go, or what effects new service launches will have on networks and operations. Given the fundamental nature of the challenges faced in the first mile, leading service providers are at best three years away from delivering and supporting a real-time, integrated services environment.
From the Service’s Perspective
The way service providers look at networks is changing fundamentally. Most operations and networks are designed from the perspective that a network technology is equivalent to a service. Now the perspective is changing to consider what capabilities are necessary to deliver a service across multiple domains to any end point at any time. The perspective is changing from a one-to-one relationship, with service tied to a specific technology, to a many-to-many relationship, where numerous services with distinct delivery and management requirements must traverse and share a diversity of networks.
In the midst of this shift, what becomes less important is whether a service is transported across an ATM, Frame Relay, Ethernet or other network technology. What is important instead is to understand the capabilities of each of these networks, how they hand off to each other, and which cross-domain paths can satisfy the requirements of each specific service for successful delivery with the appropriate quality.
This change in perspective sends ripples throughout service provider operations, because they are organized on a per-technology or per-service basis. When people talk about operational silos or stovepipes, they are referring to the one-to-one relationship model. In this model a service is either equivalent to a network technology—like an ATM path—or inextricably linked to a technology, as POTS is to TDM networks. The idea that a service is distinct from the technology it rides, and that the same service might be delivered to different end points by completely different means, is very new and has significant ramifications for service delivery and management.
“The whole management challenge is to build this complete stack,” says Martin Creaner, CTO of the TeleManagement Forum. “You’re not just dealing with the individual management challenges for each technology.”
As a result, a provider must ask what capabilities are required to deliver each service, and what data is needed to manage it and diagnose and repair any potential problems. Capabilities are abstracted from network technologies, so that where once engineers might have asked, “What ATM paths are available?” they now might ask, “What QoS capabilities can network paths that traverse multiple technology domains provide?” This simple question reflects massive changes in how service delivery and management are performed to enable a real-time, value-added services environment.
Provisioning by Capability
The impact on provisioning alone is radical, because the way it must be performed is practically the opposite of the traditional approach. Until now provisioning has been linear, technology-specific and reactive to an order for service. When a customer submits a service order, it kicks off the process of breaking down the order into its constituent parts. Then each part is assigned to the appropriate technology silo, where the individual piece is designed, assigned and activated. Once all of the pieces are set up, the end-to-end service can be tested. If any problems occur they must be identified, and the proper silo must take steps to repair or re-provision its piece until the problem is resolved.
This is a reactive, slow-moving process that cannot support real-time demands and does not consider the relationships among multiple access and transport technologies that can be involved in providing the requested service. In this model, there’s not necessarily a centralized, end-to-end view that provides the service’s perspective of the network resources in use and the source of any problems.
In the real-time environment, service provisioning makes an assumption that capable network paths are already in place so that a service can be delivered on demand to any accessible end point. As a result provisioning becomes “more a validation of network capability than a process of handing out discrete resources,” says Peter Briscoe, worldwide product manager for Cramer Systems.
Validating network capability, however, means that network capabilities across multiple domains must be understood and abstracted in order to expose the right information on which to base provisioning decisions. Once that’s accomplished, says Briscoe, a service provider must build a service definition layer on top of those exposed network capabilities. Service definitions would include the requirements necessary to deliver a given service, so that those requirements can be mapped to valid and available network capabilities in real time as part of the provisioning process.
This process of abstraction and generalization is not only necessary to enable real-time provisioning decisions, but also to shield service delivery from ongoing changes in the network. If the provisioning process can key off capability rather than technology, it minimizes the impact of network change on the provisioning process itself. As such, a service provider could define a migration process to understand what facilities and paths serve end points today and what that will look like tomorrow. An ATM path today could be an MPLS path tomorrow, but either one could provide enough bandwidth and QoS to deliver the same set of services.
Policy’s Increasing Role
Because a service can take many potential, valid network paths to its destination, policy management and execution become central to service management and delivery. In the real-time environment, provisioning decisions must be made on the fly: first to validate that a user is authorized to evoke a service, second to ensure that a valid and capable network path is available, and third to determine the type of device—be it a set top, PC, VoIP phone or handheld device—using the service.
Consider that within the IMS domain, the home subscriber server (HSS) component is essentially a policy server. This policy server, however, will still have to answer to a higher level policy manager that looks across multiple domains. There is significant debate over how this policy architecture will come to fruition, with much of the discussion centered on components like a SCIM, which handles the interactions among integrated services. The problem is that policies can be distributed among and executed by a number of systems spanning both the network and IT worlds. “If you take your policies and spread them out, you’ll have a ton of them,” says Jim Ghadbane, vice president of product management for Bridgewater Systems. “A policy engine can deal with transactions as they are starting up to help the network make a decision.”
Policy Versus Inventory
A policy engine is a centralized policy manager that can help to coordinate policy configuration while also overseeing and orchestrating consistent policy execution. What’s open to debate is the best place to centralize policy management and whether an existing platform, like an inventory system, can handle policy management or if a new, distinct policy management function and component must be introduced.
“As policy management becomes a key topic, inventory management must change to become an integral part of driving policy into the network,” says Cramer’s Briscoe. In this line of argument, policies are logical elements involved in service delivery and management, so an inventory system that can manage logical constructs well could be a good place to centralize policy management, in step with management of other physical and logical network resources. As policies change, Briscoe says, the impacts of those changes on the availability of specific network resources, like capacity, must be understood and accounted for. He admits, however, that most inventory products “don’t do ‘logical’ well today,” and that adding policy management would stretch those products even further.
Bridgewater’s Ghadbane provides a further counter-argument and says that a policy manager must be highly available and highly reliable in order to meet five-nines network expectations. Two reasons he gives for not using an inventory platform are that they “aren’t fast enough” and they “aren’t designed so that if you power them off something bad won’t happen in the network.” In its traditional role, if an inventory system shuts down, he argues, it means a service provider can’t provision new users for a short span of time. In the real-time world, however, “you have to be able to continuously deliver services without outages, which means the policy system must be 100 percent reliable.”
Upgrading Service Management
In parallel to the changes and unanswered questions relating to provisioning are a host of issues about service management. As Ghadbane suggests, service delivery becomes continuous in the real-time world, because services can be evoked on demand and they must be continuously managed to ensure quality. With service inter-working or integration on the horizon, the challenge becomes being able to manage individual services within the same user session and multiple user sessions conducted over common transport and access infrastructure. To accomplish this, a service provider must have complete visibility: across the network and down to the customer device; within the service stream, with an understanding of how interacting services affect each other; and into packet contents, to understand exactly which services are being used and what network paths they traverse.
To gain this detailed level of visibility and understanding of service behavior, service providers must collect and handle an immense volume of management data. “The amount of data can be terrifying, in a sense,” says Paul Gowans, triple play business development manager for Agilent Technologies. “Just look at the signaling data for a VoIP call versus a circuit switched call. You might have five messages for a circuit call, but if you want to do a VoIP call with Megaco and SIP, you end up with hundreds of messages.”
Yet the data burden goes beyond fault and signaling data, he says, because problems can occur with interacting services within media streams that have nothing to do with transport or access networks. Several carriers are looking at the concept of deep packet inspection, which would enable them to look into media streams to monitor service behavior and diagnose very specific problems a customer might experience.
“Troubleshooting and customer management must be inextricably linked,” says Gowans. “If you can’t take the information and drill in to look at VoIP traffic and see if something happened to affect the TV service, you’ll just have the same problem again. … You need to aggregate, sample and filter to make sense of the data you have, but then be able to drill in to figure out how to fix [problems]. It won’t be a matter of going to the set-top or any other individual device, but rather all of those things. That’s the new paradigm—understanding the interrelationships of services and managing from a customer experience perspective.”
The benefit of performing this sort of detailed data collection is that once the information is gathered, it can be filtered and applied to many different uses, including troubleshooting, billing and detailed modeling of customer demand. Incorporating this expanded data set into back-office systems and processes, though, is no small task. “You can’t have a situation where you can’t send data from one system to another because that type of information doesn’t exist in the information model,” says Bruno Codispoti, solution partner with systems integrator BusinessEdge Solutions.
He explains that one of the biggest problems service providers face is in cracking open and expanding the information models of existing operations systems to account for the new, much broader data set. It might not be possible to centralize management of the new data set without changing individual information models, Codispoti says, because “the information model is the heart of operations for these systems. That’s the life blood of the application, and every tentacle of the system uses it to some extent. If you’re saying you want to remove that piece, maintain it centrally and still have a functioning organism, it’s a very tough problem to solve.”
Service-Oriented Architecture
There may not be a way to avoid expanding information models within individual systems, but there are potential approaches for centralizing access to the new data set in order to ensure high performance and data synchronization in the real-time environment. Service-oriented architecture (SOA) is one approach that service providers are beginning to consider.
“These SOA models try to put a veneer around everything, expose interfaces and make data accessible as its needed,” says Codispoti. “You can leave the information model in each system, though you’ll have to expand it, but then you can retrieve or supply information from other places centrally. The bad news is you have another thing to maintain, but how else do you do it?”
With SOA in mind, service providers are looking at the concept of operational data stores, according to John Trembley, director of business development for Oracle and its TimesTen real-time data store product. The operational data store would be a centralized location for organizing subscriber-related data, not dissimilar to the HSS component of an IMS architecture. By organizing data around the customer—rather than around a service or technology—it may become easier to manage everything from troubleshooting to business intelligence, as it is derived from this centralized subscriber data repository. An effective SOA implementation might enable a service provider to have access to both real-time and historical data for a range of purposes, and yet have an architecture that’s geared for performance for the sake of front-end systems and real-time transactions.
“If you’re going to do an SOA, you’re breaking the links between the applications and where all of the data lives,” says Trembley. In other words, the SOA can expose the data that applications need regularly without placing demands on the back-end systems that have just been cracked open to expand their information models.
An SOA consists of three layers. The first involves composite applications, which are any applications that require action such as signing on a new subscriber, checking an account balance or anything else that is performed repeatedly. The middle layer involves reusable processes that understand the actions of composite applications and can retrieve the actual bits of data they require. The most commonly accessed data can live in this middle layer to promote high performance on the front end. The bottom layer is where the core databases and processes live, is the underlying data store for the middle and top layers, and is essentially the archive from which less commonly used data can be retrieved when necessary.
While SOA is a commonly accepted approach in many enterprise IT shops, it remains relatively new and somewhat controversial in the service provider community. “There are number of service providers that have deployed pieces of an SOA, but it’s a highly contested IT architecture and creates complex cross-functional teams that debate what a service is, what’s a composite application, and so on. It’s taking these companies a long time to get through all of that. You have to get these people to agree on what it all means, and that slows down adoption quite a bit,” says Trembley.
In other words, even if its agreed that an SOA approach is the best way to deal with the new volumes of data while maintaining operational performance, how to implement an SOA can at worst be a source of friction among operations groups and at best a painstaking process of definition, organization and compromise.
Service Assurance and Problem Diagnosis
Even if the data load is wrangled and an SOA or similar approach is implemented, there is still the issue of gaining visibility across multiple network domains so that collected data can be put into service- and customer-oriented contexts to drive service assurance and problem diagnosis.
“To understand what’s affecting your service at the top, you need to understand all of these things in a common context,” says Dave Walters, manager of product marketing for EMC Smarts. That common context could be enabled via a topology manager that can provide visibility among network layers and across network domains, while displaying how services traverse them all and thus how a problem in one layer or domain affects others around, above or below it. “Carriers are asking us to provide an integrated fault management system for these services, but it’s a matter of how much the provider wants to bite off. … What we’ll do as part of the fault management strategy … [is to] lay out the different domains and how they relate. We are a topology system of record for many customers,” says Walters.
Topology management sometimes lives in an assurance application, but in many cases it can be housed in a real-time provisioning solution. Either way, it’s a multi-domain, cross-functional element that in effect touches every distinct technology silo to provide a common, centralized view. With such visibility, the challenge then shifts to bouncing the massive amounts of management data—relating to network and service performance and trouble—off the topology view to gain an exact picture of what’s causing a problem, which services and customers are impacted and what can be done to fix it. While this approach may sound logical, consider what it entails: the expanded data models, an SOA to manage it all, a complete topology record, and applications that can take in voluminous amounts of data as symptoms, diagnose specific problems and suggest how to resolve them.
This is a long way to go with a lot of moving parts, just to be able to provide basic assurance, trouble shooting and support in the real-time environment. As a shortcut, says Walters, carriers are looking first to be able to perform “triage” where, at the very least, the domain that is a source of a problem can be readily identified. Once that’s done, the right assurance or repair team can be directed to a problem, thus eliminating the need for a multi-party conference call that can result in more finger-pointing than problem resolution.
Only the First Mile
Introducing policy management, implementing an SOA and expanding data models across a number of applications requires a lot of heavy lifting and IT grunt work. And even so, this only represents the first mile on the road to real-time and says nothing of the interoperability and support challenges found with IMS components and new network devices and standards. There are a mind-numbing number of decisions to be debated, made and implemented around new technologies and architectures and all of it must be orchestrated and pulled together eventually. When that will happen is anyone’s guess, but it certainly won’t be in time for Christmas.