Virtual Private Networks: Managing Telecom's Golden Horde

Posted in Articles
Change a word here and there, and a collection of articles about virtual private networks (VPNs) reads like the history of the Mongol hordes. Words like “invasion” are used to describe the influx of VPN services and “danger” or “death” to describe the future of leased line services. VPNs are growing-especially in hype-but that doesn’t mean frame relay and leased circuits will roll over and die like so many villagers under the threat of armed nomads. Like any other service emerging from the Internet test-bed, the VPN’s success as a mission-critical service will depend on its ability to develop strong management disciplines and adhere to service provider OSS environments. Bottom line: Ghengis Khan may have built the largest land-based empire in the history of the world, but when his successors couldn’t adjust to and manage a sedentary culture, the mighty Mongol empire collapsed.

Defining the VPN Realm

VPN is a pretty broad term these days. Sometimes it describes reality, sometimes it describes a potential future. The reality is that VPN services are most suitable today for secure, remote access to corporate Intranets. VPNs, in some cases, are being used for cost effective WAN connectivity, but frame relay seems to be the leading growth technology for such applications.

“We’re talking about a service (VPN) that is largely bought by corporate MIS organizations. These organizations are highly risk averse and they’ll spend money to avoid network outages,” says Bill Phelps, associate partner with Andersen Consulting. “[They’ve] become fairly jaded about carriers and the weaknesses they have in providing assured service levels. But they’ve become very comfortable with frame relay as a compromise solution that’s more cost-effective than dedicated circuits, with more flexibility and reasonable performance.” Phelps says that carrier frame relay offerings continue to mature and are developing some quality of service (QoS) guarantees, a feature inherent in frame relay’s Committed Information Rate (CIR) technology. The problem with IP VPN services for WAN connectivity, he says, is that the economics aren’t quite there to drive them. “When you ask, ‘Why don’t you throw out your frame relay network and get a VPN?’ The answer is, ‘It’s not going to lower costs enough to make it attractive,’” says Phelps.

VPN’s Compound Bow

In the future VPNs may potentially intimidate frame relay and leased lines by providing fixed and remote access to all of a user’s advanced voice and data services, i.e. a ubiquitous desktop. “DSL CLECs don’t think they’ll be able to achieve their investors’ aspirations just through selling high-speed Internet access. When they can go to a 10 or 20 person office and say, ‘we’ll replace your 20 phone lines with two of our pipes and a customer premise device that gives you both dial tone and IP connectivity,’ I think that will accelerate the pace of VPN utilization. The economics then become not just about a cheaper data connection, but a multi-service connection,” says Phelps. If VPNs are to reach their conceptual potential-the ubiquitous desktop-they’ll have to support the entire service spectrum.

Getting into the voice services game, however, means entering the realm of 99.99 percent availability, regulatory demands such as 911, and flexible provisioning at least at the level of adds, moves, and changes on a PBX. Early VPN equipment, however, has been relatively weak on management-a red flag for the VPN’s potential downfall. As service management capabilities have begun to evolve recently, a new set of questions come into play. Currently, VPN provisioning processes are relatively complex and time intensive, which raises questions about scalability. As well, and perhaps more fundamentally, there isn’t much communication between the VPN world, still mainly in the Internet domain, and telco OSS developers.

Marking Conquered Territory: Establishing a VPN

When a customer orders VPN service, explains John Lawler, VPN Product Manager for VPN service provider Concentric Networks, several processes begin. First, the customer needs connectivity to Concentric’s network, which generally means ordering a T1 from the serving LEC. The LEC process to fulfill this order generally takes about 30 days (so much for telco provisioning processes), during which time Concentric focuses on security. Concentric first performs a security analysis to determine what level of security--what types of encryption--the customer needs. In cases where a customer generates a high level of non-VPN traffic, firewalls may also be established at multiple sites or existing firewalls must be analyzed. Most of the equipment configuration is done in Concentric’s lab, not on-site. “We don’t do drop ships,” says Lawler, “but we do pre-configure the equipment basically following the 90/10 rule. We’ll configure according to the worksheets, but we’ll often see on site that the customer forgot to mention something.” VPN engineers are then brought in to finalize implementation and spot-fix any problems.

Controlling Each Domain: VPN Provisioning

Most VPN devices are delivered with some kind of SNMP management interface, but they don’t always provide much in the way of intelligent control or integration with higher level OSSs. As well, SNMP management for VPNs isn’t standardized; the IETF has yet to release a VPN specific MIB, so most vendors are building proprietary extensions into the SNMP interfaces.

Telco OSS vendors, in general, haven’t built VPN-specific support functions into their products yet, which leaves them looking to the equipment manufacturers for help. “In a meeting with a start-up edge device vendor, there was a huge concern that the device be supplied in a ‘manageable’ package. That is, accompanied by whatever element or configuration management is needed to make the devices provisionable from the telco perspective and able to co-exist with the legacy OSS world,” says John Borden, President and CEO of configuration and inventory OSS supplier Granite Systems. Provisioning systems for VPN devices are generally provided by the device vendors themselves, but are based on proprietary protocols and interfaces. Examples of such systems include Cisco Systems’ CiscoWorks products, Assured Digital’s AMS, Ascend Communications’ Navis, and Xedia’s Access View.

A “manageable” provisioning package for a VPN architecture is likely a policy server, which from a TMN perspective sits in both the element management and service management layers. The policy server “knows about the state and configuration of the network as a whole. It tracks multiple devices, end to end, and their policies about what level of security certain parts of the network are operating under. It tracks the connections and their policies, like what access control is allowed on these nets. It knows a snapshot of the network at any one time,” says Scott Hilton, vice president of product management with VPN systems manufacturer Assured Digital, Inc. (ADI). The policy and provisioning server, from the service provider’s point of view, expresses the network in terms of virtual domains, where each domain is a specific customer’s network, or part of a customer’s network. “Each domain can be separately defined with its own policies, levels and modes of security, and its own separate topology,” explains Hilton.

This kind of state and network information is exactly what’s needed, but in the telco world it must be communicated to the overall inventory and configuration systems, and deliverable to the front office. “The network can only know how it happens to be configured at this second, and often can’t tell you enough about that to model a service,” says Borden. “You may be able to add a command interface (for an edge device) to an activation system, but how do you go from there to telling customer care that you’ve used up the resources available in a given area or for a given technology? It all comes down to the configuration data store.” The policy server could be integrated into the configuration process and act as the VPN domain manager. The key is modeling the VPN service into the configuration and inventory systems, and utilizing some kind of standard interface technology, such as CORBA, to tie the systems together. The standardized interface will assist in integration and the service modeling will allow for higher level systems to understand and report on what’s happening in the overall VPN domain and how it relates to each customer’s overall service portfolio.

Assured Digital’s API

Assured Digital permits integration to its policy and provisioning server with two interface options. “We’ve implemented things like LDAP (Lightweight Directory Access Protocol) and some of the other directory focused interfaces to do two things. One is to get external policy provisioning events from an application at the business level. For example you might want to give your customer access for 50 remote users access. That would be an event sent to the AMS (Assured Digital’s ADI Management System policy server) and it would create all the policies and the actual configurations. We also support a direct scripted interface to create these events.” LDAP is essentially standardized, but it’s not common in the telco back office. Telcos are accustomed to hearing terms like CORBA and message oriented middleware, not LDAP. Says Hilton, “We’d like to know from the OSS community what interfaces we need to support. LDAP has been bandied about, CORBA, direct scripted interfaces…We’ve also been following directory enabled networks (DEN).” As the telco and Internet domains continue to converge, some consensus must be reached as to interface technologies. Given the current mix of interface technologies in telco environments, it will be interesting to watch this convergence with everything from LDAP to CMIP to DEN to CORBA to TL1 entering the picture. Device Management

Like many of its competitors, Assured Digital uses a proprietary agent and protocols for its device management. There are no declared standards so the vendors have developed these specialized communication pieces themselves. ADI’s agent is a small kernel that sits on the edge of the VPN device and communicates with the policy server over a secure communications channel using a proprietary UDP application. “We have activation, remove service, add service, status, and code download protocols.

We’ve also written the actual over-the-wire protocol plus all of the intelligence on how to manage the state on either side. We can make these protocols available and we have ported them into many platforms and operating systems,” explains Hilton. Though they may be portable, every vendor uses its own set of protocols and commands to control and manage these devices. If a service provider chooses to use only one vendor’s systems, it might be able to ease the integration challenge-they basically need to write to the policy server’s API.

Multi-vendor environments are inevitable, as service providers will look to buy the best equipment of each generation without trashing the old. Again, common interfaces and protocols would be a big help here. Devices can’t survive without strong, relatively easily integrated management functionality, but it’s a consensus device and OSS vendors will have to arrive at together. Right now, the edge device vendors are just learning about telco OSSs, and the OSS vendors are trying to learn about these devices, if they’re looking at all. Regardless, the industry continues to predict this Mongolian-style VPN invasion without consideration for the fact that volume requires discipline. Just as the Mongols succeeded in battle because the discipline of everyday life on the Eurasian steppe was akin to life in the military, VPNs must develop strong management disciplines fast if they are to survive their own rapid expansion.

VPN Billing

Though Assured Digital hasn’t implemented the technology yet, Hilton claims his company has the ability to produce something akin to a VPN CDR. Policy servers can timestamp each VPN session as they begin and end to determine duration. They can also record the number of bytes exchanged and the security policy associated with each session. This refined information can be stored in a relational database for delivery to a mediation or billing system. The problem isn’t deriving the data, says Hilton. It’s determining what parameters VPN providers will use for billing. Hilton poses the question, “what are they going to bill against? What data, when, and where? It’s one of the key barriers the service providers will have to get through. They can deploy service, but I’m not sure they know how to bill or could bill at a granular level beyond ‘okay, you have five tunnels.’ Maybe that’s all you want right now, but it’s certainly not the sophistication of a private line or a frame network.” Be it billing parameters or interface technologies, VPNs need to mature and develop discipline if they are to avoid the fate of the Mongols, a fate that could leave them nothing but a misconstrued memory in the history of telecom.

The Eternal Network: Today’s VPN Religion

VPNs fulfill a need for point to point connectivity. The secure IP tunnels that characterize them are conceptually similar to the leased, point to point circuit ancestors they threaten to replace. Though there are competing quality of service (QoS) technologies available, none has gotten the official approval of the IETF yet. Also, IPv6 and its QoS promises aren’t being deployed yet. VPN providers have therefore turned to over-engineering of their networks to stay ahead of bandwidth demands and provide the QoS guarantees customers expect.

Packet Loss and Latency

VPN service level agreements (SLAs) are at best rudimentary. SLAs, according to Concentric’s Lawler, are generally based on three measurements. The first two involve packet loss and latency, but they aren’t customer specific. These are generalized measurements that apply to an entire network. As Lawler explains, the IP network that supports VPN service is essentially a cloud framed by large edge routers. Every fifteen minutes or so, each edge router pings every other edge router to which it is connected, i.e. sends them a ‘can you hear me’ message and records how long it takes for the message to travel down the line and be replied to. Each router also records how many packets it’s dropped due to congestion. The data from all of the routers is then sent to a performance management application that averages these measurements to come up with a generic packet loss and latency measurement. As long as the network as a whole performs within the service provider’s defined SLA parameters, it doesn’t incur a penalty. The question here is, will generic treatment be acceptable to those that can afford to pay a little more for personal attention? Leased circuits are nothing if not reliable, and frame relay provides relatively flexible, dedicated bandwidth. Though these services might be more expensive than VPNs, most agree that the influx of competitive wide area connectivity technologies will continue to drive leased circuit and frame relay prices down. This might make enterprise customers think twice before abandoning their reliable, inherently secure pipelines for a cheaper technology that smells a little like shared bandwidth.

Guaranteeing End-to-End Availability

The third measurement is availability, which in terms of tracking is similar to latency and packet loss. In this case, edge routers “ping” the customer premise devices about every five minutes. If a premise router can’t be contacted, something needs to be fixed. That problem could be on the service provider’s network, on the LEC local loop, or at the customer premise.

For every hour of downtime, says Lawler, customers receive a day’s credit in return. In the unlikely event of a major network outage, the service provider could potentially owe its customers the equivalent of a free year or so. There are some fair caveats, however. Downtime isn’t simply measured for each customer, but cause of downtime is considered as well. Lawler gives the example, “If someone at the customer site trips over the power cord for the router, that’s not our fault. The amount of downtime it takes to figure out the problem and plug the router back in would be subtracted.” As well, acts of nature, the greatest potential causes of catastrophic network failures, are also excluded. Concentric, however, does take responsibility for the most common cause of major network outages: backhoes. Fiber cuts count, though Lawler does say Concentric would be pretty upset with the backhoe operator. Further, Concentric not only guarantees its network, but recently announced that it now also guarantees the local loops it sets up with LECs for its customers.

As of now, Concentric doesn’t have bilateral service agreements with the LECs that serve its customers, though such agreements are something it’d like to see. “If we waited until we had reciprocal SLAs with every LEC, it would take years to provide premise to premise SLAs. In essence we’re gaming, but we certainly did a lot of analysis into the SLAs before we signed off on them. We have a pretty good handle on what kind of down time statistics we’re seeing from various LECs, what kind of risk there is,” says Lawler. Lawler also says that in some regions, Concentric has access to electronic trouble ticketing systems for sending repair requests to LECs, but in many cases the process is manual. Similarly, though Concentric has some tight integration between it’s internal network management and trouble ticketing systems, the workforce/maintenance management process is still largely manual.

It Doesn’t Matter How Strong a Horseman You Are If You Don’t Feed the Horse

These partially manual processes seem to hold up in the current VPN environment, where traffic volumes are still relatively low. If VPNs do see the kind of growth that’s predicted, can networks continue to expand to keep up? More automated, tightly defined processes will help keep an ever-expanding network running smoothly-keeping the SLA penalties at a minimum.

If LECs are ever allowed into the long distance business and launch their own VPN offerings, they’ll be able to control their end -to-end quality and mean time to repair because they own the local loop. Without a bilateral service agreement, a VPN provider could be at a serious competitive disadvantage. It’s not necessarily a matter of how reliable the loop is in general, it’s a matter of a LEC being able to say “I can guarantee that if the access pipes fail, I’ll fix yours before the one that goes to some other provider.” Remember that it doesn’t matter how good the VPN network is, the access pipe is the lifeline.

comments powered by Disqus