purebill.com

Stephen Jones writing on billing and application migration

subscribe to purebill link
. Home . About . Archive . Links . Billing . Reference . Subscribe . Search . .
. Column Archive . Article Archive .

Column - 22 February 2009

Prune old data to constrain an application's size

Summary

Applying active pruning to an application's databases constrains an application's infrastructure to one proportional to the active customer base, rather than making it proportional to every customer who ever passed through.

A new application once deployed will move from having no data through a growth period as customers sign-up, or are migrated from an existing system. Eventually a steady-state will be reached where new customer additions balance, or slightly exceed, the eventual customer cancellations, with 'active' records maintained as required.

The data associated with active customers is almost by definition valuable, and the details associated with cancellations can be valuable for a time, but over the longer term old data accumulates and will make processing slower and more expensive to perform. For example, data backups must archive larger volumes, searches must look through or ignore more records, and data recoveries will take longer.

An application intended only for short-term use may retain 'old' data as an acceptable trade-off against the complexity increase associated with its removal. Conversely, applications built to run long-term must consider how to prune old data, or eventually the 'active' customer records will be just the visible part of a much larger data iceberg.

Design choices that can address this situation include:

  • Immediate removal of cancelled records: The immediate approach can be appropriate if the details are stored elsewhere, can be rederived easily, or have no business requirement for retention.
  • Removal after a parameterised time period: By designing a retention period into the application from the start, data retention can be varied easily to address changing circumstances. Data is retained only for the period assessed as being valuable.
  • Archiving old data offline during its removal: Archive and store removed data in a format that can be retrieved if required. The archived data is likely to take up less space in a compressed format not optimised for immediate access or retrieval, and offers a solution when business or audit requirements mandate data retention. Even if retrieval requires some processing effort, most of the data will not be accessed again and automation can reduce or remove the manual steps required.
  • Removing all records after a time period: Circumstances might allow data for both active and cancelled customers to be removed from an application's database. For example, a book store might remove from online access details of all orders older than 24 months, regardless of the customer's status. This approach can be appropriate when it is impossible to differentiate between an active customer who will make another order, and one who will never visit again.

Applying active pruning to an application's databases constrains an application's infrastructure to one proportional to the active customer base, rather than making it proportional to every customer who ever passed through.

Tags: , , , ,

[ Share with others ]

Post this page to a social bookmarking site:

delicious logo delicious diggit logo Digg it furl logo Furl google logo Google
reddit logo reddit stumbleupon logo StumbleUpon technorati logo Technorati yahoo myweb logo Yahoo MyWeb

 

Other 'purebill' columns

Previous column: Provide operational statistics to business users and support staff

Next column: Stretch Key Dimensions to See What Breaks

All previous purebill columns can be found in the archive section.

Recent Updates

Sign up to receive a brief text email when a new purebill column is published.

JUMP TO TOP go to top of page
.
Comments welcome: stephenjones(at)purebill.com Stephen Jones © 2004-2010 - Copyright and reprint rules | Sitemap .