Store Record Level Data
urn:js:virtue:aspire:principle:7.1
TL;DR
Data must be stored at the lowest level of granularity available
Rational
Data must be stored at the lowest level of granularity available. It can be subsequently rolled up as part of the provisioning process. In addition, data must be structured in such a manner to be flexible enough to meet current and future data requirements thus promoting reusability.
Note: Storing granular data is the recommended default approach as it offers the most flexibility. However, occasionally it may be necessary to store pre-aggregated information but this can be managed by the waiver process.
The strategic benefits for storing data at the record-level are as follows:
- Drill-down – Record-level data can always be rolled up as and when needed but record-level details cannot be derived from pre-aggregated information.
- Future-proof data needs – Storing record-level data not only fulfils the current requirements but it also provides the flexibility and extensibility to handle future requests with reduced effort.
- Improve data quality process – Record-level data is useful for troubleshooting data quality issues and it also provides lineage back to the underlying data in the source systems.
Implications
The potential implications are:
- Higher storage costs – There will be higher storage costs due to the increased data volumes. Over time as the benefits are realised, this cost will become less of an issue. Also, the costs incurred for the additional storage will be significantly lower than the cost of altering the granularity at a later date.
- Extra initial effort – Extra up front effort may be needed to design, build and test the entities for housing the record-level data. However, subsequent rework will be significantly reduced.
- Higher up front cost – The full cost of storing record-level data will be incurred at the outset irrespective of whether this is above and beyond the initial requirements. However, this should reduce the need to go back to previous layers or even the source system to meet new granular data requirements.