Enable Lockbox and Key Rotation
urn:js:virtue:aspire:proposal:4.1
TL;DR
This proposal looks at how to move away from SHA256 towards HMAC SHA256, and then how we can enable key rotation for Lockbox hashed values.
Rational
The Lockbox capability provides a library to perform HMAC SHA256 hashing, using AWS Secrets Manager as a key store. With ASPIRe we need to move away from SHA256 to an HMAC SHA256 which will necessitate a full re-hash of our data. This proposal looks at how this can be achieved but also how we can enable key rotation for Lockbox hashed values.
Why rotate
Rotating keys is considered best practice to perform on a regular basis, but also essential should keys become compromised. Should a Lockbox key become known then we will still be protected by SHA256 but this then puts us in a non-compliant position given the need to move to HMAC.
Governance have not stipulated rotation as a requirement, but when asked were very much in favour of it as it further improves our security stance.
Modelling standard
Moving to Lockbox can be achieved by adding a new column onto affected objects, changing our code and then perform a historic update before moving across to join using the new values. The historic fix can be performed using encrypted PII values (decrypt and hash), or with temporary rainbow tables build to convert existing SHA256 to a HMAC SHA256.
What this approach does not provide however is the ability to rotate Lockbox keys and perform a re-hash in an agile way. To achieve this we can consider holding the plaintext ID value (generally classfied as PII) in the same ADW object as the hashed version. Column masking facilitated by Barbossa will protect the ID value and we can then re-hash using an updated key and appropriate role through a simple UPDATE statement across the object. Using lineage or other reference data we can then co-ordinate the update across all affected objects by data item. This will then create minimal downtime for consumers and rotation can be controlled at a data item level (i.e. why re-hash GOL ID if only the Nectar ID key has been compromised).
.png)
The proposed modelling standard is therefore that where we have a hashed value in an object for the purposes of joining datasets, we also have the original unhashed value in the same object. This original value will be classified as PII (PII-Customer, PII-Colleague) in Munin so it will be protected using Barbossa with column level masking.
Implementation
If approved the plan to implement Lockbox will look something like this.
- Identify all ADW objects that hold hashed values, and which data item was used - record the following as reference data:
- Table Name (DB.Schema.Object)
- Original Data Item reference (i.e. GOLID)
- New Hashed column name - this will be in addition to the existing column holding the hash and will be the one remaining
- Plaintext value column name (proposed)
- Lockbox Key reference
- For each data item (i.e. GOL ID) add a new column to each object for the original plaintext value and the new hashed column
- The new hash column may be numeric based on ongoing work for performance improvements
- Update Munin to classify the plaintext value column as PII-Customer or PII-Colleague and run Barbossa to apply a column level mask
- Amend ETL code to
- Populate the plaintext value column going forward - this may require an adjustment of the data pipeline to deliver the plaintext value into staging
- Lockbox hash the plaintext value into the new hash column going forward
- Perform a historic update of the plaintext value column using appropriate means (one-time exercise)
- Create a procedure that will use the reference data to identify all objects that need re-hashing based on a single ID, or list of IDs - this will Lockbox hash the plaintext value column into the new hash column in the object
- Co-ordinate the run of the procedure with consumers - this will back populate the new hash column, and overwrite any new values since the ETL code went live
- Validate hash through integrity checking (Snowbocrop?)
- Amend upstream processes to use the new hash column
- Remove previous SHA256 hashing from ETL code and optionally remove the old hash column from the objects
Dependencies
Getting plaintext PII data into ADW brings some dependencies:
- Barbossa - using Snowflake column masking Barbossa will enable us to protect these values from general viewing but provide access for activities such as re-hashing
- Loading raw data directly to staging - if we don’t present plaintext IDs to staging we will need to decrypt it from curated (see proposal on decoupling curated)
Synergies
There are some synergies between this proposal and the proposed approach for ADW data retention treatment using Heimdallr. Retention treatment for sensitive data held on the various JS accounts (GOL, Nectar, etc.) will be performed on an account number basis. To achieve this in ADW it will be critical that sensitive data that requires treatment is always presented alongside the account number it relates to.
Both Heimdallr and Lockbox will therefore benefit from this approach.
Note we do not intend on treating the ID values at 4-year retention and a proposal is out to say we don’t at 7-year either. This would ensure the value is always available for re-hashing exercises.
Implications
None.