|
purebill.com Stephen Jones writing on billing and application migration |
![]() |
| . | Home | . | About | . | Archive | . | Links | . | Billing | . | Reference | . | Subscribe | . | Search | . | . |
Column - 25 March 2006 Archiving billing data for the long-termSummaryBilling receives, stores and generates high volumes of customer and transaction data in support of related business processes, customer inquiries and external reviews. Aside from billing, audit, marketing, fraud, revenue assurance and R&D are all potential consumers of (archived) billing data. Archived billing data must be stored cost-effectively - commodity hardware, open source software and non-proprietary formats (e.g. based on XML) provide options that have not been available historically. Commodity hardware reduces the infrastructure costs of deployment and archive-related processing. Open source / free software and well documented open formats ensures that data can be interpreted on future hardware / software platforms. Why archive billing data?Whilst the specifics of billing differ by industry, billing-related systems receive and generate 'high' volumes of data that must be retained to comply with corporate retention policies (legal, financial, governance), customer inquiries, for use in product development research, or retrieved and used by related business processes. The data volumes each industry considers 'high' will differ depending on their historical transaction load, and these loads change as new technologies are introduced. For example, residential telecommunication services generate billions of transactions per month at an average rate of around 100 transactions per service per month, while utilities generate one meter reading per month or quarter (e.g. residential properties). In the telecommunications industry, an ISP's use of data metering and capturing URL 'click-streams' generates a new set of 'high volume', frequently changing, intra-day transaction totals and logs that must be available for (short-term) inquiries before being stored (archived) to make way for subsequent readings. Electricity utilities introducing 'smart meters' into residential properties must capture and process a 'high volume' of readings (measured 'per quarter hour') for each customer household (~3,000 per month per meter). The processes capturing existing transaction loads must be resized to address these higher future volumes. Existing billing systems must address this increase in data volumes (from an archiving perspective) by moving older data from more expensive operational platforms where can be referenced online quickly, to slower, cheaper, high-volume archives that can hold the data until it is required or (later) deleted. This may be done in stages forming a multi-level archive that reflects the decreasing access frequency of older data, and the cheaper options available from slower and offline storage. Archiving + Retrieval = SystemIt is not sufficient to just archive data if there is no mechanism to retrieve it. Without the retrieval step, the data may just as well have been deleted. Long-term retrieval approaches must be adaptable to address future access requirements by additional business functions and operational platforms. Billers must consider the end-to-end processing required to build, maintain and support an archiving platform. The initial data sources archived may change their structure, additional data sources may be added, and the infrastructure vendors underlying the archive platform may disappear, merge or change their offerings. When data is retrieved it may be placed in a separate, temporary environment for the duration of its 'retrieved use' before its subsequent deletion (to minimise storage - deleted data can always be retrieved again). An alternative solution could place it back in its original datastore, but this would 'pollute' current data with 'old' data and complicate its removal when no longer required. Archived data may also be stored based on an older (datastore) structure that is no longer supported. A temporary environment can support these older structures more easily without complicating the existing operational environment. What billing data might be archived?The data that will be archived is often not a complete set of all the data available, but instead is a specific and purposefully selected subset that has business value or is required by external parties (e.g. industry regulators, tax departments). A key differentiator between an 'archive' and a 'backup' is that archived data is 'moved' (and deleted) from its current location, whilst backed up data is 'copied' (and retained). Usage details - The volume of customer data is relatively static when compared to the volume of generated network usage which can be orders of magnitude higher than the 'single transaction a month' that a recurring subscription reflects. From a storage perspective, unbilled network usage can remain in the billing system's prebill database until it is extracted and 'billed' (e.g. for up to three months). Once network usage has been billed (postpaid), or a statement generated (prepaid), usage details are referenced less frequently by billing, but may still be needed by related processing such as revenue assurance, audit or IT problem resolution. Archiving this data removes 'billed' data from the active operational environment minimising file system and database storage, to a place where it can be retrieved if and when required. Billing details - As network usage is billed, its data volume issues are duplicated in the bill image, especially where individual network transactions are itemised on the bill. Bill details support customers' review of their bills, and, when stored by the biller for subsequent retrieval, allow the biller's staff to view the same details as customers making inquiries / raising disputes. Billers using EBPP will retrieve and present bill details to their customers (and possibly staff) from a 'bill detail repository'. This repository's data volumes must also be managed to minimise the operational costs associated with storage. The most appropriate estimate of when to archive bills will depend on the biller and their industry. Telecommunications, with its higher transaction load, is more likely to archive bills earlier than a biller operating in a predominantly subscription-based industry (cable TV, magazines, ambulance subscriptions). Appropriate timeframes can be estimated by measuring the age of bills referenced by front-of-house staff (e.g. short-term data capture using paper logs). Inquiries against bill details often have an initial peak soon after bills are sent to customers (by whatever channel), remain 'relatively' high during the period before a bill's due date as customers re-reference their bills, before having a lesser peak around the bill's due date. The inquiry volumes after these times are substantially lower - customers have already paid their bills and have little reason to revisit old transactions and payments. Billing system data migration - An older billing platform replaced by a contemporary equivalent might have its oldest customer data (especially for closed accounts) archived and stored offline. This approach allows older data to be retained for use whilst minimising the data volumes that are converted to the new platform (and carried forward until the 'next' billing system replacement...). Properly documented, data can be extracted from its existing (legacy) databases and stored in an open and accessible (if biller-specific) format where it can be retrieved for subsequent use. Whilst this data may be accessed infrequently, it may be useful whilst the platform migration is in progress (what were the original values for this customer?), or where litigation relates to older customer data not required on the new billing platform. Items to consider when archivingArchived data has different properties to actively updated customer and transaction data. For example, actively updated data generates higher operational costs than archived data since regular back-ups must be performed to support the data's disaster recovery. Archived data does not change, and once an offsite copy has been made, the disaster recovery costs are much lower. Time to access - Will the archived data be stored using offline media (tape, disconnected disk, optical disk)? Will the data be stored onsite (as well as offsite for disaster recovery purposes...) so that access is relatively prompt (once the correct media is located)? How will the archived data be indexed? How will the index be accessed and maintained? Speed of retrieval - Once archived data has been retrieved from offline storage, it must be accessed and extracted for its intended use. Where details for individual customers are required and the volumes are low, custom retrieval scripts / software may be developed (or redeveloped) to extract the customer fields required. Where higher volumes of repeating retrievals are made, more formal retrieval mechanism may be employed. Data compression - To minimise the volume of data that must be backed up, files may be compressed before they are placed into the 'archives'. This will affect the data processing capacity required to compress and store the data, the software involved (i.e. solutions available over the long term), and the data processing required to access the data (decompression). Long-term formats - Archived data records must be stored in formats that can be read for extended timeframes measured in years, possibly with a longer life than the billing systems that generate them. Two projects utilising long-term formats in non-billing domains are OASIS and Geoscience Australia:
Long-term formats must be independent of the software that creates and interprets it. This ensures that when the data is required in the future, even where the original software is unavailable, the data can be retrieved using contemporary toolsets. Well documented open formats that can be referenced using open source / free software toolsets will maximise the future availability / accessibility of the data by separating access from dependency on a specific vendor. Different format versions over time - Over time, record formats will change as new fields are required in support of new products, services and needs not envisaged at the beginning of a format's design. For example, phone call records have had to be extended to address the needs of mobile phones, non-call transactions such as SMS, and very large content catalogs (e.g. music, ringtones and movies). Billing details may be received from networks and upstream applications in biller-specific formats (or more recently in billing software vendor-specific formats), but these proprietary details can be cleanly documented for archiving purposes. As different network transaction and vendor software versions are deployed, the formats and their documentation can be updated to support ongoing access. Descriptive details about the archive (metadata) must be captured at the point of archiving reflecting the data's structure, meaning and relationships present when the archive was generated. These details change over time, and are crucial to understand when older archives are accessed. Retention periods - The retention period of each data source is an important parameter affecting how processing is performed within an archive. For example, if datastores with different retention periods are archived together, the datastore with the shorter retention period may be retained for longer than anticipated (until the longest retention period expires). Tags: Billing, Archiving, Retrieval, Bill Detail, Usage [ Share with others ] Post this page to a social bookmarking site:
Other 'purebill' columnsPrevious column: Operational support levels differentiated by billing system needs Next column: Using location to drive operational and migration processing All previous purebill columns can be found in the archive section. Recent Updates
Sign up to receive a brief text email when a new purebill column is published. JUMP TO TOP
|
. |
| Comments welcome: stephenjones(at)purebill.com | Stephen Jones © 2004-2010 - Copyright and reprint rules | Sitemap | . |