Billing and OSS World
Search
Weekly E-mail Newsletter 

Batch Systems for Internet Billing

Dr. Matthew Lucas and Dave Lubuda-Portal Software
01/01/1999
The continuing rollout of advanced Internet services is placing new demands on billing systems. Instead of billing for simple flat-rate access, tiered usage on hi-capacity lines, MB of disk usage and subscription services, Internet billing systems must now deal with complicated QoS-based services and billing arrangements. ISPs and content providers are finding that legacy batch systems are not well suited to meet the fraud, credit management and instant provisioning requirements of on-demand, usage-based IP services. This article looks at why batch-based billing systems struggle to bill for advanced IP services, and considers a new breed of "real-time" systems which hold promise to meet next-generation IP billing requirements.

Internet services applications and users are fundamentally different from those found in traditional, dedicated telecommunications networks for a number of reasons. First, the Internet protocols (IP, TCP, UDP and associated QoS protocols) allow multiple services to be carried over the same network infrastructure. Thus, production-quality video stream and bulk file transfer applications share the same network resources and transport, although their service requirements are considerably different. Second, beyond access and subscription services, many Internet applications are usage-based, with pricing a function of the application's QoS requirements, marketing promotions and service bundles. For example, during business hours, bulk data transfer might be billed by the byte, voice by the second and video by a combination of throughput (image resolution and frames per second), minutes of use and latency. During off-peak hours, however, a certain number of hours might be included with the customer's service bundle. Finally, Internet billing relationships do not follow a traditional credit-based cycle. Instead, pre-paid and debit models are common. Alternatively, when credit accounts are used, limits must be closely monitored and billed as often as necessary to stay within a threshold, or within the amount specified by a customer's purchase order.

Although IP gives a rich, flexible platform for building new services and applications, the above characteristics place demands on the billing and support systems not found in legacy networks; particularly with respect to fraud, customer self-service, and credit management.

Fraud: One key advantage of the Internet is global connectivity. Clearly, the upside is instant access to 75 million potential customers. The downside is that a provider's fraud exposure is enormous because network access is not secure. For example, consider the wireless industry, which used to lose over a billion dollars annually because of cloning devices that defeated authentication mechanisms. IP carriers and content providers must pay particularly close attention to the fraud issue because with high-speed access a hacker could quickly consume a large amount of service or information. Even worse, a hacker could gain access to an electronic commerce server and instantly defraud providers of a substantial amount of money.

Fortunately, today's IP encryption techniques provide solid authentication of users. However, authentication mechanisms can be defeated if customers give out their PIN or their encryption keys are not secured. Therefore, an effective fraud control system must also consider authorization and usage data. For this technique to be effective, it is critical that usage data is current so that fraud can be detected and, more importantly, prevented.

Customer self-service: Many IP services, by nature, must be provisioned and activated immediately. For example, customers are unwilling to wait days for a dial-in remote access or voice account to be setup via mail, fax or phone with a CSR. Nor are these account setup approaches cost-effective on low-margin services. Instant provisioning requires effective web tools integrated with OSS systems that automatically configure network elements, create an account, and activate services.

Credit management: In addition to on-line provisioning, usage-based IP services require robust, automated credit and account management mechanisms. Given that many accounts will be created online, billing systems must be able to support pre-paid, credit card and debit models. As such, users will require accurate, up-to-the-second access to usage data and balances. In terms of account management, an effective billing system should monitor low balances and proactively keep the account current (e.g., send email, invoice or interrupt service with a payment request). Also, depending on the provider's policies, the billing system must inform network elements to interrupt service, block certain services (i.e., non-lifeline), extend credit or possibly notify a CSR the moment a pre-paid account's balance is $0.00 or exceeds a set threshold. Note that determining when service should be interrupted is not an easy calculation. For example, to determine the exact amount of time to authorize a prepaid service is a function of peak and off-peak rates, cross-service discounts, even corporate sponsorship of some charges. Another particularly challenging example is a prepaid calling card ID shared among many employees in a department. Here, the credit management mechanisms must factor in simultaneous users when calculating how to cut off service when credit runs out.

Batch processing model

When developing IP billing systems, one obvious approach is to begin with a batch system and modify it to meet the needs of IP services; after all, the batch approach has been widely used to bill for circuit-switched services for decades. The batch model, illustrated in Figure 1, is simple. After a network transaction (e.g., a voice call) is complete, there are three steps. First, the network element generates a usage record (CDR) and forwards it to the rating server. Next, periodically or whenever there is sufficient volume, the CDR "batch" is loaded and each transaction rated. Finally, rated events are forwarded to the billing system and posted to the customer's account.

Figure 1: Data-flow in a batch system.

As illustrated in Figure 1, usage information flows in a single direction from the network elements to the billing system. This process is well suited to bill circuit-switched voice for two reasons. First, because circuit-switched voice is usually invoiced on a 30-day cycle, the billing system does not have to be closely synchronized with network usage (e.g., several days before a call is posted is acceptable latency). Second, access to the PSTN is secure. Thus, any calls placed on the "line" are unequivocally the responsibility of the account owner. With a long billing cycle, secure access, and credit-based accounts there is no need for integrating sophisticated fraud control mechanisms (e.g., authentication, authorization and CDR analysis), credit management or customer self-service mechanisms with the billing system.

Another key advantage of the batch model for circuit-voice is simplicity. This manifests itself in several ways. First, in terms of fault tolerance, each of the above processes operates independently and concurrently. Thus, if any of the phases fail, there is sufficient time to reload the data and run the process again without record loss. Likewise, if the interconnection fails, records can be easily retrieved after connectivity is restored, or via tape in the worst case. The simplicity of the model also gives a scalable solution since as the volume of records and number of network elements increase, each process can be replicated or upgraded as required to meet demand. Finally, the three-step process easily allows usage data to be normalized as it proceeds toward the billing system. In particular, various CDR formats used by each switch vendor can be normalized and coerced to the same format.

Inadequacies of the batch model for Internet services

When billing for Internet services, it is critical that the billing system is closely synchronized with network activity. This is the fundamental limitation with the batch model, because billing data proceeds in a rigid, three-step process: CDRs created at the NEs, passed through the rating engine and ultimately posted to the customer's account. The delay incurred at each step significantly impacts the effectiveness of the fraud detection and credit management systems. For example, consider the case where a customer at a college dorm gives out his account information. Within hours, there may be dozens of users on that account. A typical batch-based fraud detection approach is to count the number of minutes of use at month's end, and if there is more than 60 minutes of use to an hour, then there's fraud. Yes the fraud is detected, but it is too late! Likewise, credit management mechanisms also heavily rely on timely usage information from the network elements. For example, in a pre-paid model, the service should be discontinued at the exact moment the balance runs out. Likewise, the credit management system needs to know when the balance on an account drops below a certain threshold, so it may begin service discontinuation notices or give the customer the option to refresh the account online via a credit card.

One solution is to speed up the batch process. That is, instead of taking days or weeks for a network transaction to post, carefully design a system that takes minutes or hours before a transaction is posted. This "near real-time" approach alleviates some of the latency problems found in traditional batch systems. For example, an account setup online can be created in hours instead of days. Likewise, a customer's view of the account is current within the latency of the batch process. However, now matter how fast the batch process runs, there are still the following limitations:

(1) Instantaneous online registration: As a customer enters account setup information, the billing system must interactively validate the data and verify that the network elements are configured to accept the service. Without interactive validation, any erroneous data will prohibit the account setup and require expensive CSR intervention.

(2) Authentication: The customer must be authenticated at the start of each transaction. Although batch systems can verify the identity of the customer, the authentication system must prevent fraud by checking for simultaneous logins or suspicious user activity. Further, once fraud is detected, the billing and OSS systems must identify where the malicious users are entering the network, disable established service, suspend the account and generate trace logs - all while the fraudulent users are online.

(3) Authorization: Each transaction must check with the billing system to see if the service is authorized. At first, this may appear to be static information. In fact, in many cases it is not. For example, if a pre-paid customer has a low balance, then the customer is only authorized to use exactly X number of minutes for that service. Alternatively, a provider's policy might be only to authorize a limited set of services when an account balance falls below $0, or when the customer exceeds a threshold on a credit-based account. These authorization models require integrated support from the billing system.

(4) Service provisioning: Many IP services require in-session dynamic provisioning capabilities. For example, consider a videocast application in which the speaker wishes to spontaneously increase the bandwidth dedicated to the conference because he or she wants higher resolution for an important part of the meeting. From a bandwidth management perspective, this is a hard problem because the OSS system must determine the availability of network resources, and reallocate as appropriate. From a billing perspective, this is equally challenging because the OSS must determine in real-time whether the user's plan allows the upgrade, if there are any limitations (e.g., an off-peak-only bandwidth plan) and if they have enough credit. Of course, throughout the service upgrade the billing system must precisely charge for the network service received. Such dynamic provisioning can't be done with a batch-based billing system.

(5) Customer self-help: As with many usage-based services, customers will require up-to-the-second usage and account information. Thus, not only must each network transaction be posted immediately, current services must also be considered if the customer is to get accurate usage.

In effect, batch systems (near real-time, or not) have no way to interact with the billing system. Thus, the user has carte blanche until the data makes its way to the billing system. Many billing experts believe that the batch process cannot be adequately finessed to address the above problems. Instead, billing and support systems must be engineered from the ground up with real-time support for fraud, customer self-service and credit management.

What's so tough about real-time billing?

Anytime you see the word "real-time," you can safely assume this is a complicated system requiring a lot of rocket science (or, more appropriately, computer science). Fortunately, a real-time billing system is not a "hard" real-time system - as you might find in an air-traffic control system, a pacemaker or the space shuttle. That is, if a billing system does not post a network transaction immediately, nobody is going to die.

Fundamentally, however, a real-time billing system shares many of the same design challenges. One of the most important aspects relates to fault tolerance and failure recovery. That is, the system must be configured such that each network element makes the best decision it can with the data that is available. For example, consider a customer trying to launch a video application across a wide-area backbone. Let's say the OSS is able to authenticate the customer and verify his subscription to the service, but is unable to contact the billing system to verify the account balance. It is not acceptable to simply wait for the billing system to respond or for network connectivity to be restored (as this could take several seconds, or several hours). Instead, policies must be in place that either allow or deny service. Developing such failure back-off and recovery logic, and implementing it within the OSS, BSS, and network is difficult. If done correctly, however, the system will have the advantages of real-time billing support, but degrade gracefully to a near real-time system during fault situations.

The second design challenge with a real-time batch system is performance. Consider Figure 2, which shows the transactions that take place in a typical real-time billing process. First, a message is sent from the network elements to the OSS, to authenticate and authorize the service. Upon authorization, the network elements must then forward accounting information to not only the billing system, but possibly other support systems as well. What is critical in this process is that the queries and updates are quickly processed. Today's standard is the few seconds it takes to setup a call in a circuit-switched network. Therefore, the IP OSS and billing system must respond to AAA messages quickly. Since response time is a function of load, the system must easily scale with the number of services offered, number of users and size of the network.

Figure 2: Real-time call setup.

The final key challenge is integrating the network elements with the OSS and billing system. To illustrate this, consider a customer who wants to increase his quality of service in the middle of a connection (e.g., a video application requiring better performance). Here, the application must send the request to the OSS system. The OSS must, in turn, go through the AAA procedure, tell the network elements to terminate the existing connection, establish the new connection (assuming the resources are available) and, finally, update the billing system. Clearly, the message formats and service primitives must be standardized for this to work. Mediation systems are of great use here (for more information about IP mediation, read "Mediation in IP Networks", Billing World, October 1998). However, other interfaces need to be considered as well. For example, let's say a pre-paid customer's account has run out, or the billing system detects multiple logins or other fraudulent activity. The OSS system must be configured to accept these triggers and take the appropriate action. This is not trivial.

Typical architecture of a real-time billing system

A real-time billing system is typically driven by a transactionally managed, multi-tier architecture (as opposed to client-server). While batch systems are process based (as shown in Figure 1), real-time systems are event driven. This means that events (e.g., service requests, fraud triggers and insufficient credit balances) cause business logic to execute atomically in real-time. The business logic may consider the current state of the subscriber, network or status of other related events to perform the desired action. For example, in cases of customer registration or service provisioning events, external OSS systems need to be transactionally updated with the new information.

While each transaction is typically small and well isolated, hundreds or thousands of events of all types may enter the system every second. Since the billing system is interacting with the customer and OSS in real-time, it is critical that the billing system is engineered properly such that response times are limited to an order of 100ms during peak load. To meet the latency requirements and transaction rates mandating high-performance OLTP databases. In a properly configured system, the database will be the bottleneck. Thus, it is important that the billing system minimizes its impact on database subsystems by using advanced caching and queuing techniques. In addition, the architecture must look for opportunities to process transactions in parallel. Fortunately, business logic (e.g., rating) is typically stateless. Thus, it can be replicated across multiple servers thereby achieving tremendous parallel throughput and high availability.

Another key design decision is whether to centralize or distribute billing data across multiple systems. Ideally, it is both - where logically the data is centralized, but physically it is distributed. This permits centralized maintenance and ensures a consistent view of the data while providing all the benefits of service availability, remote authentication, reduction of transaction collision for the same record, higher scalability and geographical distribution. As illustrated in Figure 3, an effective model of geographically distributing a billing system is pushing the static data (login info, available services, etc.) to the satellites while managing all dynamic data (usage events, credit limits, real-time authorization, etc.) centrally. This approach is particularly elegant because, where possible, it distributes static data with simple database replication, and relies on the centralized dynamic data with real-time event processing where needed.

Figure 3: Distributed billing data processing model.

Finally, the billing architecture must be designed to scale and achieve a high degree of fault tolerance. A distributed architecture with the master system skillfully orchestrating the satellites' parallel operations scales much better than a purely centralized system that is constantly passing data over great distances for every operation. Having the ability to geographically distribute data also ensures higher availability. For example, when the master site is brought down for maintenance, network services are still operational through remote authentication, authorization and billable event collection. If a satellite site goes down, a distributed real-time system can automatically failover to other satellites or even to the master site. By employing simple backup algorithms, the complexity of decisions at remote sites can be greatly reduced if the data is not available.

Conclusion

There is substantial momentum in the industry today to move towards real-time billing. Microsoft has migrated MCIS from a batch system to real-time. Cisco is developing DEN specifically to address real-time online provisioning. Real-time is driven by the commoditization of access, and creative pricing of differentiated services. ISPs must be able to tell network elements ahead of time what customers can do, and how much credit they have left. Finally, a real-time complements the flexibility of the underlying IP transport. For example, if there is excess bandwidth the billing system should be able to provision higher QoS to certain subscribers. Likewise, if there are pre-purchased terminating IP telephony minutes available, it should route calls to that provider and automatically offer the best deal and discounts, and so on.

All of this makes a compelling case for real-time billing systems as the best choice for future Internet services. However, there is no free lunch, as such systems are complicated and hard to build. Fault tolerance, performance, and scalability will come with a price. And, as always, there are other considerations. Is the system open? Will the system have the flexibility to keep up with new, next-generation applications and network technologies? Does the ISV expose the API it uses to enhance the system? What modifications can an ISP make to the system? How much of the system behavior can the ISP change? Is the billing system vendor's revenue model based on consulting or on software licensing? Good luck!

About the Authors:

Matthew Lucas is an IP billing and multicast system consultant. He holds a Ph.D. in computer science from University of Virginia and a B.S. in math/computer science from Carnegie-Mellon University. Matthew can be reached at matt@telestrategies.com.

Dave Labuda earned his BS & MS in Computer Engineering from Case Western Reserve University. Dave spent nine years at Sun Microsystems managing UNIX OS development and SunOS releases, and the last five years at Portal as CTO and VP Engineering. In Dave's spare time, he enjoys hiking, wine-tasting, ice hockey and developing complex rating scenarios.


    Share this article: Email, Slashdot, Digg, Del.icio.us, Yahoo!MyWeb, Windows Live Favorites, Furl
    RSS Add this article feed to: RSS, My Yahoo, Newsgator, Bloglines

    Read Comments [0]

    Post a Comment

    Email Email this article Comment Add a comment
    Print Printer version Reprints Order reprints
    RSS RSS Feed Bookmark Bookmark article







    Subscribe to Billing & OSS World Magazine
    First Name Last Name
    E-mail

    Sponsored LinksB/OSS Magazine Announcements