Link Search Menu Expand Document

Create Consistent Keys

urn:js:virtue:aspire:principle:5.1

TL;DR

Surrogate keys (system generated unique identifiers) must be assigned when the data is cleansed, conformed and integrated.

Rational

Surrogate keys (system generated unique identifiers) must be assigned when the data is cleansed, conformed and integrated. These keys must then remain unchanged in all subsequent downstream layers. They will enable data linking and the de-identification of sensitive personal identifying information.

The main drivers are:

  • Enables data linking – Having consistent key values enables the linking of datasets by joining on common data elements.
  • De-identification of sensitive data – Surrogate keys enable the masking, pseudonymisation and/or anonymisation of sensitive personal identifying information.
  • Traceability of data – Simplifies the traceability of data for lineage and audit purposes.
  • Ease of use – Surrogate keys are simple keys and will consequently result in a simpler join condition. Changes over time – Surrogate keys facilitate the storage and retrieval of record change history over time.

Implications

The potential implications are:

  • Key management – A robust mechanism is required for key generation and validation. However, this one-off cost will be rapidly recovered by the benefits outlined above.
  • Tracing masked data – It may be possible to trace masked data back to the underlying raw identifying data. However, this should not be an issue as all user access will be managed via the appropriate data access layer (see Section 4.3.1).