Link Search Menu Expand Document

Maintain Change History

urn:js:virtue:aspire:principle:3.1

TL;DR

Change history must be maintained for the data elements using the appropriate industry standard techniques.

Rational

Change history must be maintained for the data elements using the appropriate industry standard techniques.
For example, within the data stores change history can be maintained at the record level in the following ways:

  1. Where change history is not required the old version of a record can be overwritten by the new version. – this should be avoided as it may not be possible to recreate change history at a later date
  2. Where full change history is needed then whenever an element of a record changes, the old version of that record must be retained (i.e. not overwritten) and a new version created. Over time, there will be multiple versions of the record that will constitute the full change history. – this is the recommended default approach as it offers the most flexibility
  3. Where just the current and previous values of a particular data element are required then the record can be updated to store both values. – this offers partial history and may be considered during the detailed design phase

In certain scenarios, it may be appropriate to maintain change history at the dataset level instead of the record level – as would be the case for the received feed files.

The main drivers are:

  • Historic views of data – The key benefit of having the full change history is that it will provide the capability to recreate historic views of the data.
  • Audit purposes – Change history is needed for audit purposes and may also be needed to support certain legal and regulatory requirements.
  • Data provenance & lineage – The change history will support data provenance as well as business and technical lineage of the datasets and data elements.

Implications

The potential implications are:

  • Logical deletes only – Data will not be physically deleted but logical (or soft) deletes will be implemented instead. This gives the capability to recreate historic views of the data.
  • Higher storage costs – There will be higher storage costs due to the increased data volumes. Over time as the benefits are realised, this cost will become less of an issue.
  • Extra initial effort – Additional effort will be needed to design, build and test the history handling functionality. However, subsequent rework will be significantly reduced.
  • Higher up front cost – The full cost of developing the history handling functionality and additional storage costs will be incurred at the outset irrespective of whether this is above and beyond the initial requirements. However, these costs will be significantly lower than the cost of altering the granularity of the history at a later date.
  • Retention policy – Change history should be stored for the maximum length of time permitted by the data retention policy. After this point in time the records must be deleted automatically.