Best Practices for Data Integration
By Satish Gujar
In addition to the selection of the right network planning tools, a key to the successful adoption and implementation of a planning process for any communication service provider or mobile network operator is data source integration. Planners often spend the bulk of their time in data gathering rather than true planning. The size and scope of the required data for planning makes it impractical to manually load it into your network planning tools. In fact, one of the key drivers for adopting a planning system is to streamline and automate the data-loading aspect of the planner’s task.
The data integration effort is often underestimated and poorly implemented, which results in the operator not realizing the full potential benefits of network planning. Data inputs for planning typically include network-inventory data (physical and logical), network statistics or performance data, usage data for subscriber and device behavior and subscriber and traffic forecasts. Additionally, reference data such as equipment and cost information must also be entered and updated regularly within your network planning tools to keep your planning system current.
Challenges that exist in a typical large scale data integration effort are outlined below.
- Multiple data sources and multiple data formats – the required data may live in several repositories whose owners are in different departments. The operator may have inherited various legacy systems because of mergers and acquisitions. Data can also be in various formats such as flat files, XML documents or spreadsheets.
- Data quality – data may be incomplete and mismatched. Organizations may establish different naming conventions for the same data. E.g. Circuit names across systems may be different.
- Data volumes – Call-detail records may range in number from millions to billions in a single day. This huge data volume must be aggregated into meaningful usage-type information.
Data integration process typically involves 3 steps: Extract, Transform and Load:
- Extract data from sources over the appropriate mechanisms, SQL, file transfers, web services.
- Transform data from the source formats (SQL result sets, text, XML, SOAP) to the appropriate formats for further loading; this step can also include data enrichment and correlation across sources and validation. Filtering and aggregation of data may be necessary to reduce the volume of data and rollup to the appropriate intervals supported within the planning system.
- Load data into the planning system and report errors and data dropout.
A customizable framework that implements the Extract Transform Load processes is key to a large-scale data integration effort. Important aspects of this framework are described below. Operators require a framework that:
- Is open and which can support multiple extraction mechanisms and multiple data formats. Ideally this framework should support industry standard scripting mechanisms that can be customized to the specific needs of the data interface and data formats. Customizations can be done by the vendor, a third party integrator or the end-users.
- Factors in source data-quality issues like referential integrity or name mismatch and provides tools and methodologies to report and address it.
- Can adapt easily to new sources or allow migration of existing data sources to new data repositories when necessary.
- Scales up to processing hundreds of gigabytes of data as is the case in call-data records from a tier-one operator.
- Supports the automation and coordination of individual tasks and which can be scheduled to run during low-load conditions.
VPIsystems' OnePlan solution has developed its own network planning tools featuring an Open ETL framework that is being used in deployments at tier one operators throughout the world. This framework has been used to integrate with common operations support systems for inventory and performance data as well as CDR and billing data.
Satish Gujar is senior manager, sales engineering at VPIsystems , which provides network analytics for capacity planning, network profitability, business metrics and demand analysis. He has 15 years of enterprise software experience in large-scale, high availability systems, working with globally distributed development groups and customers. Before joining VPIsystems, Satish was a project lead and senior software engineer at Telcordia. Contact him at satish.gujar@VPIsystems.com .