|
purebill.com Stephen Jones writing on billing and application migration |
![]() |
| . | Home | . | About | . | Archive | . | Links | . | Billing | . | Reference | . | Subscribe | . | Search | . | . |
Column - 27 July 2008 Architectures for future billing systems?SummaryNew systems are constructed from the contemporary technologies and ideas present at their initial design, with new capabilities implemented around their original architecture. Today's (billing) systems were built based on the technologies and ideas present a decade or more ago. What could systems built from scratch today use in their base architecture within the areas of storage and computation? A contemporary storage architecture could start with the ideas present in the Google File System, available outside Google in the form of the Hadoop open source project. Contemporary computation architectures could be built with the ideas from the BOINC project, Google's map/reduce approach, and from the mainstreaming of computing clusters such as render farms and Beowulf clusters built from multi-core commodity hardware. These architectures leverage the pricing and scaling benefits of commodity parallel processing and mass storage. Key Architectural RequirementsCurrent and future billing systems are likely to capture greater detail from the increasing number of transactions they must process with these additional details used/required for billing differentiation and post-billing analysis. This requirement will drive increased storage to hold not only the data to support the immediate needs of billing, but also historical details (the longer the better) to support both customer inquiries (including self-service) and internal data mining (e.g. marketing and fraud). Initial examples triggering these increasing storage needs include segmentation of content 'bytes' from within mobile phone data metering, and intra-day metering in the electricity industry (enabling measurement and pricing of peak-period energy consumption). In addition to storage, increased pricing sophistication is being asked of the billing process driving additional computational effort, and though Moore's Law has been delivering improvements, enterprise billing systems commonly require multi-server architectures to support them. Once the 'need' to deploy multiple servers has been established, the design question becomes one of how to organise the servers to meet both today's needs and allow easy scaling for tomorrow. Data Storage ArchitectureAllocation: Existing data storage systems operate on the presumption that storage is a scarce and expensive resource which must be allocated precisely, and constrained to avoid overuse. However the actual pricing and sizing for storage (disk) hardware is moving in the opposite direction as the cost per gigabyte is trending towards zero. This trend allows companies to offer 'unlimited' storage for applications such as e-mail supported by 'cents in the dollar' advertising instead of more expensive subscriptions. Storage allocations for new systems often require an 'accurate' estimate of future needs before the system goes live, later followed by an effort to tune (increase) the storage allocation requiring a process of business case justification. File allocations measured in terabytes can be seen as a 'big deal' that must be carefully planned and judiciously reviewed. The Google File System (GFS) is an alternative architecture that leverages the storage attached to computing servers (e.g. 200GB+ per server) as a redundant disk farm used to store source and result details, along with intermediate working files produced during computation. As Google adds new servers, additional storage is provided as a side effect and becomes available as a global resource for data storage. Since new servers come with newer, larger disks, the total storage increases over time independently of any increase in server count. Whilst having some large physical limit, the GFS provides multi-terabyte/petabyte data storage for Google processing, and the demonstration examples mentioned in Google's various whitepapers commonly cite the processing of multi-terabyte data sources as 'small' examples of 'what can be done'. Hadoop is an open source project providing an alternative to the (proprietary) GFS for users outside Google. The project's aim is to provide GFS storage and Map/Reduce computing infrastructure outside Google – indeed one of the major users of the systems is Yahoo who use it in their search index processing. By implementing Hadoop, the computing and storage benefits enjoyed by Google can be utilised broadly and applied to a wider range of problems, possibly including billing.Performance: The GFS delivers high-performance storage as demonstrated by the numbers in Google's Map/Reduce whitepaper. The ability to index the Internet repeatedly in time periods measured in hours and days demonstrates the storage and retrieval performance of GFS against multi-petabyte data stores. The GFS whitepaper also indicates that the default setting for data redundancy at that level of performance is 'three copies' distributed across datacenters and geography to ensure that no data is lost. Cost: The GFS also indicates the technology direction for hardware. The long-term direction for disk storage has had its cost per gigabyte decrease from multiple dollars per gigabyte to fractions of a dollar per gigabyte, and this trend will continue until the cost is measured in cents. As storage becomes cents per gigabyte, the 'decision costs' on how much to purchase and how to allocate it for use will quickly exceed the cost of the disks themselves. It will be easier (cheaper) to purchase hundreds of terabytes rather than make decisions about 'exactly' how much should be purchased. The cost of not having 'enough' will exceed the costs of having 'too much'. The process of adding additional storage will also need to be improved to avoid it overwhelming the hardware costs. Adam Jacob talked about 'automated infrastructure' for start-ups at a recent Web 2.0 Expo in San Francisco, but this approach should apply just as much for larger organisations who will compete against those start-ups. Backup: Backup will be addressed by replicating data across multiple geographic sites. Offline tape storage may become less common as the cost of storing data online (with multiple copies) becomes easier and cheaper, and the necessity for retrieval 'at speed' becomes more important. The GFS whitepaper indicates that three copies of data has been sufficient to ensure that Google have not lost information they subsequently required. The ability to scale additional storage as and when required suggests storage will be purchased in bulk and used as required, rather than on a 'per project' basis. As new storage is implemented, the oldest storage from a few years ago may be taken offline since it will likely be smaller, and more expensive to run in terms of 'performance per watt'. The focus on energy costs per unit of processing and storage will become more important and may drive (and fund) the use of contemporary hardware over the retention of older equipment. Computing: The additional benefit that GFS provides is that the storage infrastructure contributes to and/or is a side-effect of Google's computing infrastructure. The same servers that support business computation also support the GFS storage needs. The collocation of computation and data storage reduces the volume of data that must be shipped around Google's internal networks. Parallel ComputingBilling as an embarrassingly parallel problem is ripe for distribution across a collection of servers. The processing of one customer usually bears no relationship to the processing of the next customer allowing them to be performed concurrently across different computing platforms. Many related functions of billing can be performed concurrently, with examples being rating, the billing cycle, revenue assurance, fraud detection, financial processing, and mediation. Parallel processing allows a large problem to be broken into parts and processed concurrently across multiple CPUs, especially where these are multi-cores. Since billing is embarrassingly parallel, much of it can be addressed by dividing its constituent billing processes into processing steps that can be distributed. By designing the processing to take advantage of this, large problems can be solved in a shorter amount of time and commodity (or at least less specialised) hardware can be employed to provide the raw computing resources. By dividing problems and processing them concurrently, the timescale to processing completion can be shortened, and a large data volume can be processed within a given time period. This structure can also reduce the dependency of processing on any one server allowing for higher availability. Capacity can be redeployed as required including load balancing systems that peak at different times. When multiple applications share the same infrastructure and distribute their workload across multiple servers the needs of different application types can be balanced. Those applications that are CPU intensive can be balanced against those that are I/O intensive to provide better utilisation of the infrastructure. Alternate Computing ArchitecturesThe question is one of how to distribute the workload across multiple servers. Three current models for processing parallel workloads are:
Google's computing platform is composed of commodity white label boxes, designed to their specification, which employ standard storage hardware with a layer of Google software on top (e.g. a tuned Google Linux distribution using the Google File System teamed with the Google Map/Reduce software infrastructure). As new servers are added, the amount of work that can be performed will increase. Some servers could be reserved for specific projects, applications, or special purposes (such as big-memory sorting). By assembling a collection of servers whose workload is not dedicated to any one application, or whose storage is not dedicated to any one purpose, a general computing resource can be assembled. Beowulf clusters have been used for many years to perform weather simulation and physics calculations, and have been demonstrated at high capacity over periods of months and years. The ability of Beowulf clusters to run on basic commodity-style hardware has been well demonstrated, though sometimes the processing problems that are data intensive require high-performance (proprietary) interconnect infrastructure to support their I/O needs. The BOINC project provides a mechanism for subscribers to donate spare CPU cycles on their existing computer infrastructure to support computationally expensive (but low-resource) projects such as SETI@HOME, weather simulation and protein folding for new drugs. The BOINC project has to date been a community-based effort, but with the BOINC infrastructure open source, it could be adapted for use internally within an organisation. Possible Solution: Using BOINC for billingBOINC processing packages up the work to be performed, and then parcels it out 'on-demand' to people's PCs (and servers) around the world. When a PC has completed a work unit, its results are dispatched back to the central (BOINC) project servers for further processing. A PC can hold more than one work unit and on completing one will immediately start the next work unit whilst it reports back. Each server can subscribe to multiple 'projects' to fully utilise the processing time available, and based on the ability of the originating projects to provide and receive results. A similar infrastructure could be used within a business to parcel work into manageable units of different response times/timescales. For example billing could be parcelled up into processing that would require an hour or two of processing, and this could be distributed amongst the servers performing work. Servers will get through their work at different speed and where the work to be performed is I/O bound, multiple work units could be run concurrently on the same server. This would allow work to be performed on demand with servers working as hard as the work required can be provided. When one project is unavailable (or out of work), the computing resources that would otherwise remain idle can be employed by other projects to get their work done, and, in support of this, each server can subscribe to a list of projects, and when one project is unavailable work on other projects can be performed (based on a predetermined priority). Work servers periodically contact each projects' servers to check for work, and when work becomes available will resume their highest priority workload. The amount of resource dedicated to each project may also be varied by time of day and day of week. Processing that occurs weekly may be dedicated to servers that are under utilised during the weekend. As well, servers used for daytime processing can be used at night to perform general processing. Tags: Billing, Architecture, BOINC, Google File System, Google MapReduce, Performance, Storage, Concurrency, cluster [ Share with others ] Post this page to a social bookmarking site:
Other 'purebill' columnsPrevious column: Improve System Deployments through Scripting Next column: Managing outage processing through alternate landing zones All previous purebill columns can be found in the archive section. Recent Updates
Sign up to receive a brief text email when a new purebill column is published. JUMP TO TOP
|
. |
| Comments welcome: stephenjones(at)purebill.com | Stephen Jones © 2004-2010 - Copyright and reprint rules | Sitemap | . |