Enable Lockbox and Key Rotation

urn:js:virtue:aspire:proposal:4.1

TL;DR

This proposal looks at how to move away from SHA256 towards HMAC SHA256, and then how we can enable key rotation for Lockbox hashed values.

Rational

The Lockbox capability provides a library to perform HMAC SHA256 hashing, using AWS Secrets Manager as a key store. With ASPIRe we need to move away from SHA256 to an HMAC SHA256 which will necessitate a full re-hash of our data. This proposal looks at how this can be achieved but also how we can enable key rotation for Lockbox hashed values.

Why rotate

Rotating keys is considered best practice to perform on a regular basis, but also essential should keys become compromised. Should a Lockbox key become known then we will still be protected by SHA256 but this then puts us in a non-compliant position given the need to move to HMAC.

Governance have not stipulated rotation as a requirement, but when asked were very much in favour of it as it further improves our security stance.

Modelling standard

Moving to Lockbox can be achieved by adding a new column onto affected objects, changing our code and then perform a historic update before moving across to join using the new values. The historic fix can be performed using encrypted PII values (decrypt and hash), or with temporary rainbow tables build to convert existing SHA256 to a HMAC SHA256.

What this approach does not provide however is the ability to rotate Lockbox keys and perform a re-hash in an agile way. To achieve this we can consider holding the plaintext ID value (generally classfied as PII) in the same ADW object as the hashed version. Column masking facilitated by Barbossa will protect the ID value and we can then re-hash using an updated key and appropriate role through a simple UPDATE statement across the object. Using lineage or other reference data we can then co-ordinate the update across all affected objects by data item. This will then create minimal downtime for consumers and rotation can be controlled at a data item level (i.e. why re-hash GOL ID if only the Nectar ID key has been compromised).

alt text

The proposed modelling standard is therefore that where we have a hashed value in an object for the purposes of joining datasets, we also have the original unhashed value in the same object. This original value will be classified as PII (PII-Customer, PII-Colleague) in Munin so it will be protected using Barbossa with column level masking.

Implementation

If approved the plan to implement Lockbox will look something like this.

Identify all ADW objects that hold hashed values, and which data item was used - record the following as reference data:
- Table Name (DB.Schema.Object)
- Original Data Item reference (i.e. GOLID)
- New Hashed column name - this will be in addition to the existing column holding the hash and will be the one remaining
- Plaintext value column name (proposed)
- Lockbox Key reference
For each data item (i.e. GOL ID) add a new column to each object for the original plaintext value and the new hashed column
- The new hash column may be numeric based on ongoing work for performance improvements
Update Munin to classify the plaintext value column as PII-Customer or PII-Colleague and run Barbossa to apply a column level mask
Amend ETL code to
- Populate the plaintext value column going forward - this may require an adjustment of the data pipeline to deliver the plaintext value into staging
- Lockbox hash the plaintext value into the new hash column going forward
Perform a historic update of the plaintext value column using appropriate means (one-time exercise)
Create a procedure that will use the reference data to identify all objects that need re-hashing based on a single ID, or list of IDs - this will Lockbox hash the plaintext value column into the new hash column in the object
Co-ordinate the run of the procedure with consumers - this will back populate the new hash column, and overwrite any new values since the ETL code went live
Validate hash through integrity checking (Snowbocrop?)
Amend upstream processes to use the new hash column
Remove previous SHA256 hashing from ETL code and optionally remove the old hash column from the objects

Dependencies

Getting plaintext PII data into ADW brings some dependencies:

Barbossa - using Snowflake column masking Barbossa will enable us to protect these values from general viewing but provide access for activities such as re-hashing
Loading raw data directly to staging - if we don’t present plaintext IDs to staging we will need to decrypt it from curated (see proposal on decoupling curated)

Synergies

There are some synergies between this proposal and the proposed approach for ADW data retention treatment using Heimdallr. Retention treatment for sensitive data held on the various JS accounts (GOL, Nectar, etc.) will be performed on an account number basis. To achieve this in ADW it will be critical that sensitive data that requires treatment is always presented alongside the account number it relates to.

Both Heimdallr and Lockbox will therefore benefit from this approach.

Note we do not intend on treating the ID values at 4-year retention and a proposal is out to say we don’t at 7-year either. This would ensure the value is always available for re-hashing exercises.

Implications

None.