Data cleansing: What you don't know will hurt you

Comments
Print
Poor data hygiene? It’s not a dirty little secret: The nation’s telecom billing databases contain millions of names and addresses, many of which are redundant, incomplete or just plain wrong. Tracking subscribers as they move, marry, divorce, and change their identities, rank, and titles is a daunting task for telcos, even those without multiple or convergently billed services.

A database with old or duplicate addresses and non-existent accounts is embarrassing and a drain of time and money. Customer service representatives - usually the only human contacts that customers have with telecom providers - depend on accurate customer information. It is bad form, for instance, for a CSR to offer a wireless calling plan to a customer who already has it. Getting the bills right “shows the subscriber that the carrier cares enough to know them,”says Kevin Murtha, senior vice president at FirstLogic, Lacrosse, Wis. Bad addresses also cause headaches: bills don’t get paid quickly, and money is lost when marketing material is sent to customers who no longer exist. Uncle Sam gives big discounts to carriers that comply with U.S. Postal Service address standards for automated mailing, an immediate payoff for telecom providers with accurate subscriber lists.

More than reputation at stake

But great databases don’t just save a telecom’s reputation-they enhance it. A properly integrated database can create a single view of the customer, which is increasingly important to carriers with converging services and billing systems. The telcos need a system that will present marketing departments and CSRs with a comprehensive look at the habits and usage patterns of the subscriber who may eventually sign up for local, long distance, wireless, high-speed data or cable TV from the same provider.

Human beings are always on the move, and so are their accounts. Cindy E. Williams of 225 North West St. decides to move down the street a few houses. She gives the information to her phone company and the CSR keys in the new address. During the transaction, her name and address change to Cynthia Williams, 221 South West St. Dr. Joseph Lawkins marries. His wife, the former “Jane Doe,” becomes the “Mrs.” in “Dr. and Mrs. Joseph Lawkins.” Charles Jackson, a short-time customer, disappears. He moved out of his apartment without returning the cable box. He is long gone, but you still consider him a customer, even after he failed to pay his bill for the third month in a row. And what about keeping track of all of those vanity addresses, such as Oak Ridge Farm, Burning Duck Ranch, or those ever-obnoxious apartment numbers? Apt 2, Apt. 2W, Apartment 345, Dept. 6, ad infinitum.

Keeping track is tough

Larry Bernstein, owner of Have Laptop, Will Travel consulting services in Short Hills, N.J., has seen it all when it comes to off-the-wall addresses. “In Queens, N.Y., the corner is in the street address; if you live at 145-5 16th Ave., you are the fifth house in from 145th Street. That changes the algorithm [in the database],” Bernstein says. “In Brooklyn, it’s similar, but with no hyphen .In New York, no one wants to live in a basement. In the brownstones the basement is called the ground floor, bottom floor or the first floor. Or is the first floor the next floor above the lobby? Some apartment complexes are so big, there is a difference between Apt. 1-A and Apt. A-1. There had to be a template for how that building address was handled. In New Jersey, there are so many Mountain avenues, there’s a lot of confusion about which Mountain Avenue you live on.”

In Seattle, a street ends at the river bank, so low-numbered house addresses begin at the river and increase as the houses string away from the river. When developers built houses on fill extending the opposite direction into the river, Bernstein says, they had to use negative addresses, as in “-15 River Rd.”

FCC sought billing changes

Poor quality databases also prevent carriers from keeping up with changes in federal pricing requirements and other unforeseen accounting requirements. Carriers had a hard time complying with new rules set down last year by the Federal Communications Commission.

On June 1, 1998, the FCC issued a "Memorandum and Order Relating to Tariffs Implementing Access Reform," establishing regulatory guidelines that allow telcos to recover $2.75 per month for secondary business lines and $1.50 per month for secondary residential lines. The FCC considered two methods for recovering access charges from subscribers. The service location method lets carriers simply charge for the primary line and any other secondary line that terminates in the same location. Under the account method, carriers are required to make a reasonable effort to identify the person or business that owns the line before charging the fee. The FCC and subscribers liked the accounting method because it could accurately match subscribers to lines and other information, such as usage and service history. But poor databases made it difficult to use the account method, especially when delineating between people with the same last name living in the same apartment building. For instance, Mark Smith receives the access charge fees for both himself and the unrelated neighbor, Sharon Smith, because the database sees the same last name at the same address.

Faulty LEC databases ultimately led the FCC to implement the service location method, says FCC's Rich Lerner, deputy chief, Competitive Pricing Division, Common Carrier Bureau. LECs simply count multiple lines going into an apartment or house, check the service dates and assign the older line as the primary line." We're not going to look inside to see if you have several family members with different lines," Lerner says. "It should take most of the guesswork out."

The FCC also set new formulas for calculating local access fees paid to LECs by IXCs. Under the new rules, IXCs pay LECs a smaller per-minute fee, but pay an added flat rate per line. Again, carriers with bad databases lost out because the flat charge is based on the kind of line the subscriber uses. The access rate for a primary residential or business line differs from the rate to access a secondary line. What kind of service does the subscriber use: Does the business use a CENTREX or ISDN line? If a business is located on the first floor of an apartment building, is it clear at that address which lines are commercial and which are residential? Again, carriers with unreliable databases couldn't bill accurately.

On June 1, 1998, the FCC issued a “Memorandum and Order Relating to Tariffs Implementing Access Reform,” establishing regulatory guidelines that allow telcos to recover $2.75 per month for secondary business lines and $1.50 per month for secondary residential lines. The FCC sanctioned two methods for recovering access charges from subscribers. The location method lets carriers simply charge for the primary line and any other secondary line that terminates in the same location. Under the account method, carriers are required to make a reasonable effort to determine the actual identity of the person or business entity before charging the fee.

The FCC and customers prefer the second, more tedious method because it is fairer. For instance, using the location method, two Smiths living in the same apartment building create problems. Mark Smith receives the access charge fees for both himself and an unrelated neighbor, Sharon Smith, because the database sees the same last name at the same address. In late March, the FCC implemented the service location definition, where LECs simply count the lines going into an apartment, check the service dates and assign the oldest line as the primary line.

“We’re not going to look inside to see if you have several family members with different lines. It should take most of the guesswork out,” says the FCC’s Rich Lerner, deputy chief, Competitive Pricing Division, Common Carrier Bureau.

The FCC last year set new formulas for calculating local access fees paid to LECs by IXCs. An accurate view of line ownership is vital for figuring the new charges, too, because long-distance carriers now pay less in per-minute charges but are charged a flat rate per telephone line. That flat rate depends on the telephone line the subscriber uses; the rate to access a primary residential or business line differs from the rate to access a secondary line. Does the business use a CENTREX line, or ISDN lines? Is the business located in the bottom floor of an apartment building, so commercial and residential lines are at the same address? To account for those charges, new data element was added to the billing format during the Ordering and Billing Forum committee meeting in spring 1998: the pre-subscribed inter-exchange carrier charge (PICC). LECs were directed to send detailed information to IXCs to back up the charges, including line-type indicators, dates, rates and historical information (See Billing World, June 1998, Page 56). Some IXCs wanted LECs to provide details about individual lines before the IXCs receive the access bills, in order to better track the charges.

Good addresses save money

The dream of a streamlined and accurate database has become so universal that a growing number of companies have begun to specialize in scrubbing and improving billing databases.

Some of the bad data is input at the point of sale. CSRs or part-time solicitors keying in new customer accounts misspell names, add the wrong middle initial, wrong city or ZIP code.

“Data goes in incorrect, but never gets updated,” says Michael Friedl of DialRight Software, Inc., which produces software that automatically updates phone numbers for area code splits.

In spite of that, some companies are willing to accept a certain amount of wrong data. “One of our customers had 30 percent of its database wrong,” says Friedl. “There’s this residual pain that goes on underneath. There’s a certain amount of pain these companies can take. Different companies have different thresholds of pain.”

Group 1 Software, Lanham, Md., specializes in database maintenance functions, including address standardization for mailing efficiency, software for tax jurisdiction assignment, and electronic documentation software that lets telecom providers design personalized bills for individual customers. Group 1, which performs data cleansing for BellSouth Mobility in Florida, developed Merge/Purge Plus, which identifies and eliminates duplicate records. The algorithm even identifies duplicates with slight variations or misspellings in the name and address components; database managers can tweak the name-matching algorithm to find similar names based on “loose,” “medium,” “tight” or “equal” matching parameters. Managers can include or exclude the first name as a requirement for duplicate detection to use the software for either household or unique individual merge/purge operations, and can include apartment numbers as extensions of street addresses, so records will be identified as duplicates only if the apartment numbers match. The system also produces a clean and separate mailing file, and another list all duplicate names.

Saves postage from the get-go

“Mail that is prepared for automation with an incorrect address is only good for delivering that mail to the wrong place as quickly as possible,” says Martin Sohovich of Group 1. “If a mail piece cannot be delivered, the mailer-as well as his or her client-has lost the inherent opportunity in that mail piece.” The system also helps carriers earn postal discounts for mass mailings by complying with the U.S. Postal Service’s Coding Accuracy Support System, a uniform method of writing addresses that can be picked up by automated mailing systems. The discounts save carriers a lot of money, especially if they mail tens of millions of bills and/or marketing material each month.

But there is more to streamlining a database than installing automated software with miracle algorithms to sift and update names and addresses. The difficulty of the task grows exponentially when telecom providers inherit databases; as carriers buy cable companies or ISPs with their own databases and formats, combining those systems into one cohesive unit can be complicated.

“All these companies have legacy systems, and a lot of these companies have files that are suspect,” says Martin Sohovich of Group 1 Software. “How they did it and why they did it certain ways, no one knows; it’s a long history.”

Legacy of pain

“A lot of the problems have to do with disparate legacy systems,” Murtha says. “You’ve got something from MCI, something from WorldCom, all these islands of information. They’re trying to consolidate knowledge about customers and get a hold on who their customers are so they don’t look silly when they call someone.”

According to Murtha, performing data cleansing while switching from a legacy to a client/server platform takes several complicated steps.

“The first step is to architect the data warehouse,” he says. . “That will tell you the platform you’ll use. Once they’ve got the warehouse architected, along with the data marts, the carrier has to pick a [system] platform based on speed, storage capacity, and expandability.”

Once the company picks an architecture and platform, the data can be moved to the new system without interrupting the database’s function. If done correctly, there should be no interruption to the database, Murtha says.

“They’ll take it out of their operational database. They have a second subset of that data, they’ll do the marketing campaign, for instance, and then most likely, post that system back. There’s a real trick to that,” he explains. “You’re trying to take real-time data, enhance it while it’s online. In a lot of cases, they’ll pull that data offline, create a data store, take a snapshot of that data, and put that data back on the server with more storage, or elsewhere.” Such projects can take anywhere from two days to several weeks.

Other Projects Can Take Years

Weldon McLaughlin, area manager of the Wholesale Billing Project Management at SBC, CITY, has spent two years transferring SBC’s and PacBell’s legacy CRIS and CABS databases to a new client-server architected database. The change enables SBC’s wholesale customers to look at a single view of billing information across seven states-incorporating 32 million accounts. The data cleansing was done in-house. The project has been wildly successful, says McLaughlin.

Cincinnati Bell calls in the troops

Cincinnati Bell is one ILEC that called in the data scrubbers, -- those specialists in name-dropping, address consolidation and data management. This RBOC recently overhauled and standardized its existing customer data to create a single view of the customer. Thedata standardization and architecture group hired Origin Technology in Business, Inc., CITY, a subsidiary of Philips Electronics, to perform the work using FirstLogic’s i.d. Centric data cleansing product.

Cincinnati Bell wanted to view multiple customer accounts across all databases. Origin developers first migrated legacy data from the RBOC’s IBM mainframe system into a client/server application consisting of a Hewlett-Packard 9000 UNIX and Oracle 7.3 Enterprise server, which was running Windows NT and Microsoft SQL Server 6.5 database. In one large batch job, Origin extracted the data, cleansed it and loaded it on the Enterprise server, which now serves as the online operational data store; it continues receiving daily transactional data from the mainframe systems. Daily downloads of i.d. Centric ensure that the inserted customer address data into the operational online data store is correct.

The system also catches bad data when new accounts are keyed in, Origin developers say. Cincinnati Bell sales reps verify names and addresses on demand using a sales force automated system (ONYX).The system verifies each new customer order as it is entered on the sales rep’s laptop and downloads it into the in-house system. The system flags and rejects invalid names and addresses. When bad data are detected, an automatic e-mail is sent to the user.

Four steps to cleanliness

At Vality Technology Inc., Boston, developers created a four-step process to clean up and manage untidy databases, says Jeanne Friedman, vice president of marketing. Using its INTEGRITY product, the company helps telecommunications companies consolidate their billing systems to allow a single view of the customer. The product has four capabilities:

Whole investigation: the product performs an overview of the database status quo. “We first find out what the data looks like,” Friedman says. “It gives them a real read of what their data is, looks at data patterns, distribution of values.” For example, “999999999” may indicate a default, where data entry personnel left fields blank. “When you have millions of records, after the operational system has been in operation for a few years, you don’t know what you have in there,” says Friedman.

Address fields may be populated with “doing business as,” “trustee for” and other phrases that hide the data. “People have even put ‘deceased’ in the address field,” Friedman says.

Parsing and standardization: once the disparate styles used for typing data into fields are identified and arranged, the product defines patterns and standardizes data so information reads uniformly in all the fields, Friedman says. For instance, the product can find all of the “doing business as” or comparable phrases in the address fields, which helps determine whether parent companies and subsidiaries are being billed at the same time.

Probabilistic matching: the product performs fine matches across multiple fields and values. For example, says Friedman, “A carrier may have a household with three different phone numbers, three different account numbers. We can go in and match name, phone number and find those records that match. It could be across your whole database.”

The software also performs fuzzy matching, which lets it find “consolidated customers,” people with different names using the same phone number. The function, for instance, can find the name of a wife that shares a phone number with her husband, but retains her maiden name. That allows carriers to design a program to market to such customers. This function also helps carriers more accurately charge access fees and universal service charges.

Survivorship: the product recommends what the addresses should look like: for example, whether “street” should be spelled out or abbreviated. It also can run the new and improved database against third-party address sources, such as the U.S. Postal Services’ CASS or Dunn & Bradstreet’s, to confirm that addresses are correct. Friedman says the system can also automatically correct data entry to default to the standards chosen by the carrier.

AT&T is using Vality’s system to integrate five mainframe billing systems-some of which are on IBM OS/390 platforms-into a single system, Friedman says. “It will provide them with a consolidated view across several billing systems.” Once a database is in good order, it can expand customer base and create snapshots of customer behavior, Murtha says.

“ [The database] deals with customer relationship management. It helps you break down information down to do intelligent comparison of buying patterns, geographic information, and other data.”
That all-important single view
Creating a single database gives convergent carriers an edge.
“It’s easier for them to know where the customer is, helps them provide better customer response, better track the client, better measure the client’s history,” Sohovich says. Databases that pinpoint customers’ usage and habits give carriers “the ability to cross-sell and up-sell bundled services,” says Kay Benaroch, field program manager with EMC Corp. of Hopkinton, Mass. “They know who the leading customers are, people who are ready to buy more multiple services. They can do trending analysis, and find particular population areas-physical neighborhoods-that contain a group of highly technical users. Those people might have more than one phone line, Internet and wireless lines and fax. A carrier can locate and target them for a DSL marketing program. They cannot do that without an integrated view of their database.”

Comments