Link Search Menu Expand Document

ADW Presentation Layer Model Standards

urn:js:virtue:aspire:standard:20.1

TL;DR

The standards to be followed by Data Architects when creating the ADW Presentation Layer Model.

Definition

The standards for the ADW Presentation Layer Data Model are detailed below. The naming convention for each standard follows this approach:-

Example, PGE-1 means

  • P denotes the Presentation Layer Solution/Physical Data Model
  • GE denotes the General section
  • 1 is the number of the standard.

The shortcut used is clearly indicated in the section names below.

General (GE)

Standard Overview Detail
PGE-1 The ADW Presentation Layer model will be driven by physical considerations. It will be built from the ground up to include:-
  • Decision about whether to roll-up of sub-types or not
  • Physical column decisions, for example data type, precision, time formats
  • Denormalisation for performance reasons
  • Clustering (if necessary)
  • Decision on whether objects are physicalized as Tables or Views
PGE-2 A common set of abbreviations will be used in both the Integration and Presentation Layers. The abbreviations will be mastered within ERWIN.

The abbreviations will apply to non-key columns only.

Tables and Views will not be abbreviated unless the length is greater than 30 characters.
PGE-3 Reference data will be represented in Dimensions which on a case by case basis may contain history or not. Dimensions will be Type 1 or Type 2.
PGE-4 Objects will not have their implementation choice included in their title. View and tables will not be prefixed with VW_ or TB_ to maintain flexibility in the Presentation Layer
PGE-5 Where an object is implemented as a view the full definition of the view will be mastered. Within the Data Modelling tool, all columns for the view will be defined and the SQL syntax will be mastered.

This SQL syntax may be passed from the Developers for non-trivial cases.
PGE-6 Single version of the truth. Data derived from a business rule will be mastered in the Business Data Vault area of the Integration Layer. Metrics, KPI’s, segmentations must be derived in a single place to avoid divergence.
PGE-7 Natural keys(numeric in nature) in PL should be considered as ‘Number’ data type While fetching data from RDV natural key column, if it is numeric in nature, this needs to be casted as ‘NUMBER’ in all PL objects. Currently RDV natural keys are all VARCHAR.

If any data scenario exists where in RDV , non-numeric values are being received instead of numeric ,then it has to be the squad decision whether they want to cast it as numeric in PL or not. Reporting team/data users should be made aware of these decisions.

Whenever a PL objects make use of BDV objects (ie, _BR or _REP objects) such transformations can be performed in BDV itself.
PGE-8 Suffix ‘_TD’ should be used for any column defining time duration Any attribute in RDV/BDV/PL fact or dimension should use suffix ‘_TD’ if the definition says it’s a time duration.
PGE-9 A PL object can reference another PL object It is acceptable to access PL objects for creating another PL object. The aim should be to make such references as simple and transparent as possible.

When creating new PL objects, the PL object select statements should not involve complex logic on the attributes selected.

Eg: Dimension look up is allowed

If however, complex logic does need to be performed on the selected attributes from the PL object then it is suggested that a new BDV object is created that contains transformed data from the PL object. This BDV table (plus any other sources of data) can then be used to create the new PL/BDV object.
Eg: Using a PL metric to derive another metric in PL is not allowed.
PGE-10 Publication of date and/or time columns In the PL publish:
  • If date only is available only a date (_DT) column need be included.
  • If time only (e.g. store opening time) is availabe only a time (_TM) need be included
  • If date and time are available publish:
  1. Date (_DT) and time (_TM) columns using the local/business date and time (i.e. what the customer would see if they looked at the clock on the wall).
  2. A business/local timestamp (TS). Use Snowflake TIMESTAMP_NTZ data type.
  3. A UTC timestamp (_UTC_TS). Use Snowflake TIMESTAMP_NTZ data type.



Dimensions (DM)

Standard Overview Detail
PDM-1 Dimensions will follow a common naming standard. The standard for a dimension is
SCD1 dimension follow below naming convention.
Type 1 dimensions can be built on top of Type 2 dimension (if available)
View should show the latest status of each primary keys. If an SCD2 view exist in PL, then number of unique primary key in both views should be matching.
When a record is deleted from source, RECORD_DELETED_FLAG can be used to show this state. This is an optional field
If SCD1 is implemented as a view, LOAD_TS should be present as technical column in the view. If multiple tables are involved for building the view, take the max(LOAD_TS) from those tables.
See the section on Technical columns for the definition of which are applicable.
All SCDs shall contain default record -2 ‘Not Applicable’. Union the default record with query
PDM-2 The Primary Key will be inherited from the Integration Layer and will follow a common naming standard. The standard for the Primary Key is <Object_Name>_<Datatype Shortcode>, for example CUSTOMER_CD or DATE_DT.
The Primary Key of the Dimension will become the Natural Key from the associated Hub, so for example if the Primary Key of the Hub is a two part Natural Key the Primary Key of the Dimension will also be a two part key.
The <Object Name> will mirror the name of the Dimension less the suffix, so for example DIM_STORE_LOYALTY dimension will have a Primary Key of STORE_LOYALTY_CD.
The <Datatype Shortcode> will be:-
  • _CD for varchar/strings
  • _ID for integers/numbers
  • _DT for dates.
Column names are always in uppercase.
PDM-3 Dimensions in the Presentation Layer may or may not contain history. Type 2 Dimensions will contain the following additional column as part of the Primary Key:-
VALID_FROM_DT

Also this additional attribute will be added:-
VALID_UNTIL_DT

There are default values for Low and High dates as follows:-
Low Date : 01-01-1900
High Date: 31-12-9999

For Dimensions the lowest level at which history is maintained is currently Day level.

Difference between VALID_UNTIL_DT and VALID_FROM_DT of adjacent records of the same primary key shall be 1 day. There shouldn’t be gaps of more than 1 day.

When a record is deleted from source, RECORD_DELETED_FLAG can be used to show this state. This is an optional field

See the section on Technical columns for the definition of which are applicable. If multiple tables are involved for building the view, take the max(LOAD_TS) from those tables.

All SCDs shall contain default record -2 ‘Not Applicable’. Union the default record with query
PDM-4 Dimensions must include the Natural Key The Natural Key will be the Primary Key of the Dimension. It will be visible to Customers so Natural Key selects and joins can be undertaken.
PDM-5 Snowflaking is permitted within Dimensions with a standard pattern for the columns to include. When a Dimension contains a Foreign Key to another Dimension this will be permitted.

To limit the amount of joins at runtime needed after the _PK column has been added the recommendation is for the following columns to be added to the Dimension too:-
  • Natural Key
  • Any other common columns, e.g. a Code and Description
An example is on the DIM_CUSTOMER Dimensions the following columns may be present:-
  • CUSTOMER_CATEGORY_CD
  • CUSTOMER_CATEGORY_DESC.
PDM-6 Technical Columns will exist on all Dimensions. Technical columns will allow the Operations team to track and monitor data loading into the Presentation Layer. See the section on Technical columns for the definition of which are applicable.
PDM-7 The order of columns will be consistent. With a Dimension the columns will always be ordered as follows:-
  • Primary Key(s)
  • Optionally Valid From/To Dates
  • Attributes
  • Technical Columns.
PDM-8 Dimension mandatory and optional columns. The following cardinality must be followed:-
  • Primary Key - Mandatory
  • Valid From/To Dates – Mandatory for SCD Type 2
  • Attributes – Optional
  • Technical Columns – Mandatory.
PDM-9 Extra meta-data will be added for each Dimension to enable a Framework driven approach to be developed, to improve reporting and make clear the lineage to the Integration Layer. The following meta-data will be created and populated for every Dimension:-
  • L5 Subject Area – the subject area within the Presentation Layer that this object is mastered
  • L5 Integration Layer Source Master – the Integration Layer master for this object
  • Solution Model Layer – set to Presentation
  • Dimensional Type
    • Dimension Type 1
    • Dimension Type 2
PDM-10 Extra meta-data will be added for each Dimension Column to enable a Framework driven approach to be developed, to improve reporting and make clear the lineage to the Integration Layer. The following meta-data will be created and populated for every Dimension Column:-
  • Presentation Layer Column Type
  • Primary Key
  • Dimension Type 1 Attribute
  • Dimension Type 2 Attribute
  • Valid From Date
  • Valid To Date
  • Technical Column
PDM-11 Default values will be present in all Dimensions. The following two rows will be present in all dimensions (including the date and time dimensions):-
  • Unknown, use -1
  • Not Applicable, use -2
For dimensions having composite keys , These record has to be created with -1 , -2 for all keys.
For eg:- DIM_SAMPLE has a Key1 and Key2 which forms composite key for the dimension then
  • -1 has to be inserted to both Key 1 and Key2
  • -2 has to be inserted to both Key 1 and key 2
PDM-23 Dimensions that will or may include multiple banners need to contain additional columns. The existing unique identifier (e.g. STORE_CD) remains as is but the values will be modified to create a unique value to differentiate between the brands. Two columns should be added:
  1. OPERATING_COMPANY_CD - the identifier for the brand/banner (values TBD for Sainsbury's/Argos)
  2. The identifier of the dimension from the brand, e.g. OPERATING_COMPANY_STORE_CD

Sample Dimension Definition
DIM_STORE (Type 1)

Attribute Data Type Mand? Description
STORE_CD VARCHAR Y This will be primary key of the Dimension and is inherited from the Integration Layer.
First part of the Primary Key.
First Attribute Various N Always null cardinality.
Second Attribute Various N Always null cardinality.
...
BATCH_ID VARCHAR N Unique process value to identify the job instance (e.g. Ab Initio ETL, Snowpipe, Python process)
JOB_NAME VARCHAR N The name of the [sub] process that inserted this record.
LOAD_TS DATETIME N System Load DateTime for this row.

DIM_STORE (Type 2)

Attribute Data Type Mand? Description
STORE_CD VARCHAR Y This will be primary key of the Dimension and is inherited from the Integration Layer.
First part of the Primary Key.
VALID_FROM_DT INTEGER Y For SCD Type 2 history tracking.
Second part of the Primary Key.
VALID_UNTIL_DT INTEGER Y For SCD Type 2 history tracking.
First Attribute Various N Always null cardinality.
Second Attribute Various N Always null cardinality.
...
BATCH_ID VARCHAR N Unique process value to identify the job instance (e.g. Ab Initio ETL, Snowpipe, Python process)
JOB_NAME VARCHAR N The name of the [sub] process that inserted this record.
LOAD_TS DATETIME N System Load DateTime for this row.

Facts and Fact Less Facts (FA)

Standard Overview Detail
PFA-1 Facts will follow a common naming standard. The standard for a Fact is FACT_<object_name>, for example FACT_ITEM_PURCHASE or FACT_PROMOTION_WITHDRAWL.
Fact names are always in uppercase.
PFA-2 Fact Less Facts will follow a common naming standard. The standard for a Fact is FLF_<object_name>, for example FLF_CUSTOMER_OFFER*
Fact Less Fact names are always in uppercase.
PFA-3 Facts and Fact Less Facts must inherit the Primary Keys from the Dimensions. For each Dimension in the fact ensure the _PK column is present in the Fact.
As Dimensions contain history relationships between a Fact and a Dimension require the specification of a date element.
PFA-4 Facts and Fact Less Facts must contain either a Date or Date Timestamp. This denotes that the FACT or the FLF is an event/atomic level fact – i.e. a business event happened on a date or at a time within a date.
When a FACT is at a Date and Time grain the Date and Time must be split into separate columns (Please see PGE-10 for full details)
PFA-5 Facts must contain at least one Measure. At least one Additive or Non-Additive Measure must exist in a Fact.
PFA-6 Fact Less Facts have no Measures. The difference between a Fact and a Fact Less Fact is that a Fact Less Fact contains no Additive or Non-Additive Measures.
PFA-7 Technical Columns will exist on all Facts. Technical columns will allow the Operations team to track and monitor data loading into the Presentation Layer.
PFA-8 The order of columns will be consistent. With a Fact the columns will always be ordered as follows:-
  • Foreign Keys to Dimensions
  • Date or Date and Time (as separate columns)
  • Measures
  • Technical Columns
PFA-9 Dimension mandatory and optional columns. The following cardinality must be followed:-
  • Foreign Keys to Dimensions - Mandatory
  • Date or Date and Time (as separate columns) - Mandatory
  • Measures – Optional
  • Technical Columns – Mandatory
PFA-10 Extra meta-data will be added for each Fact to enable a Framework driven approach to be developed, to improve reporting and make clear the lineage to the Integration Layer. The following meta-data will be created and populated for every Dimension:-
  • L5 Subject Area – the subject area within the Presentation Layer that this object is mastered
  • L5 Integration Layer Source Master – the Integration Layer master for this object
  • Solution Model Layer – set to Presentation
  • Dimensional Type
    • Fact Event Level
    • Fact Less Fact
PFA-11 Extra meta-data will be added for each Fact Column to enable a Framework driven approach to be developed, to improve reporting and make clear the lineage to the Integration Layer. The following meta-data will be created and populated for every Dimension Column:-
  • Presentation Layer Column Type
    • Primary Key
    • Measure Additive
    • Measure Non-Additive
PFA-12 In Facts all Foreign Key Links are mandatory If a relationship does not exist to a Dimension the following defaults are used:-
  • Unknown, use -1
  • Not Applicable, use -2
PFA-13 Create an indicator (suffix ‘_IND’) column in Fact for any respective flag (suffix ‘_FLAG’) column which contains values restricted to ‘Y’/’N’ In the ‘_IND’ column decode ‘Y’ as 1 and ‘N’ as 0. This will enable the reporting to implement summing logic quite easily.
If the flag column contains values other than ‘Y’,’N’ (e.g. ‘U’ for unknown, ‘NA’ for Not applicable, ‘DQ’ for Data Quality’ etc.) then the column should not be decoded to any numeric value as we should not assign any random numbers other than 0,1 for these flag values.
PFA-14 Separate natural keys should be deployed for DIM_TIME if a link is required to the DIM_TIME dimension If there is a requirement for the fact row to be linked to the DIM_TIME dimension using that datetime attribute, an additional column should be added to the fact table defined as a TIME(9) (natural key) and containing only the time element of the datetime attribute. This will form a natural foreign key to the DIM_TIME dimension on column TIME_TM.
PFA-15 Facts that will or may include multiple banners need to contain an additional column. The existing unique identifier (e.g. STORE_CD) for the dimension FK remains as is but the values will be modified to create a unique value to differentiate between the brands. One column should be added:
  1. OPERATING_COMPANY_CD - the identifier for the brand/banner (values TBD for Sainsbury's/Argos)

Sample Fact Definition
FACT_ITEM_PURCHASE

Attribute Data Type Mand? Description
ITEM_CD VARCHAR Y This will be primary key of the Dimension represented in the Fact.
First part of the Primary Key.
PROMOTION_ID NUMBER Y This will be primary key of the Dimension represented in the Fact.
Second part of the Primary Key.
CUSTOMER_CD VARCHAR Y This will be primary key of the Dimension represented in the Fact.
Third part of the Primary Key.
PURCHASE_DT INTEGER Y Date of the transaction.
Fourth part of the Primary Key.
PURCHASE_TM VARCHAR Y Time of the transaction.
Fifth part of the Primary Key.
First Attribute, e.g. Quantity Various N Always null cardinality. This is an example of a Measure – Additive attribute.
Second Attribute, e.g. Till Number Various N Always null cardinality. This is an example of a Measure – Non-Additive attribute.
...
BATCH_ID VARCHAR N Unique process value to identify the job instance (e.g. Ab Initio ETL, Snowpipe, Python process)
JOB_NAME VARCHAR N The name of the [sub] process that inserted this record.
LOAD_TS DATETIME N System Load DateTime for this row.

Aggregates (AG)

Standard Overview Detail
PAG-1 Aggregates can be rolled up data from Facts, or a daily slice of a combination of Fact and Dimensional data. There is no separate type of Snapshot object, if required on a daily basis this becomes a Daily Aggregate for example.
PAG-2 Aggregates will follow a common naming standard. The standard for an Aggregate is AGG_<object_name>_<frequency>, for example AGG_CUST_PURCHASES_D or AGG_PROMO_SUMMARY_M.
Aggregate names are always in uppercase.
PAG-3 Frequency supported are Hourly,Daily, Weekly, Monthly, Quarterly and Yearly. The standard for these <frequency> is:-
_H - Hourly
_D – Daily
_W – Weekly
_M – Monthly
_Q – Quarterly
_Y – Yearly.
PAG-4 Aggregates will have the same standards as Facts but will also include in its Primary Key the link to the reference data about its frequency. As part of the Primary Key of an Aggregate the following columns must be present:-
_H – Hourly – PK of the Hour Dimension
_D – Daily – PK of the Date Dimension
_W – Weekly – PK of the Week Dimension
_M – Monthly – PK of the Month Dimension
_Q – Quarterly – PK of the Quarter Dimension
_Y – Yearly – PK of the Year Dimension
PAG-5 Technical Columns will exist on all Aggregates. Technical columns will allow the Operations team to track and monitor data loading into the Presentation Layer.
PAG-6 The order of columns will be consistent. With a Fact the columns will always be ordered as follows:-
  • Foreign Keys to Dimensions
  • Frequency Foreign Key
  • Measures
  • Technical Columns.
PAG-7 Dimension mandatory and optional columns. The following cardinality must be followed:-
  • Foreign Keys to Dimensions - Mandatory
  • Frequency Foreign Key - Mandatory
  • Measures – Optional
  • Technical Columns – Mandatory
PAG-8 Extra meta-data will be added for each Aggregate to enable a Framework driven approach to be developed, to improve reporting and make clear the lineage to the Integration Layer. The following meta-data will be created and populated for every Dimension:-
  • L5 Subject Area – the subject area within the Presentation Layer that this object is mastered
  • L5 Integration Layer Source Master – the Integration Layer master for this object
  • Solution Model Layer – set to Presentation
  • Dimensional Type
    • Aggregate
PAG-9 Extra meta-data will be added for each Aggregate Column to enable a Framework driven approach to be developed, to improve reporting and make clear the lineage to the Integration Layer. The following meta-data will be created and populated for every Dimension Column:-
  • Presentation Layer Column Type
    • Primary Key
    • Measure Additive
    • Measure Non-Additive
PAG-10 In Aggregates all Foreign Key Links are mandatory. If a relationship does not exist to a Dimension the following defaults are used:-
  • Unknown, use -1
  • Not Applicable, use -2
PAG-11 Aggregates are different to Extracts. An Aggregate contains the Primary Key and one of more Measures. The naming is consistent with other Facts and Dimensions elsewhere in the Presentation Layer.
An Extract can alter the names of the columns to suit the target audience (may be external), and for ease of access may contain a mixture of Measures and Reference data for ease of access.

Sample Aggregate Definition
AGG_STORE_PROMO_STATUS_M

Attribute Data Type Mand? Description
STORE_CD VARCHAR Y This will be primary key of the Dimension represented in the Aggregate.
First part of the Primary Key.
MONTH_CD VARCHAR Y Frequency of the Aggregation.
Second part of the Primary Key.
PROMO_STATUS_CD VARCHAR Y A segmentation for the Measures.
Third part of the Primary Key.
First Attribute, e.g. Status_Cnt Various N Always null cardinality. This is an example of a Measure – Additive attribute.
...
BATCH_ID VARCHAR N Unique process value to identify the job instance (e.g. Ab Initio ETL, Snowpipe, Python process)
JOB_NAME VARCHAR N The name of the [sub] process that inserted this record.
LOAD_TS DATETIME N System Load DateTime for this row.



Extracts (EX)

Standard Overview Detail
PEX-1 Extracts are for external consumers outside of the Presentation Layer. An Extract can alter the names of the columns to suit the target audience (may be external), and for ease of access may contain a mixture of Measures and Reference data for ease of access.
PEX-2 Extracts will follow a common naming standard. The standard for an Extract is EXT_<object_name>_<target>_<frequency>
Target is optional and only required for an external audience
for example EXT_ARI_DATA_D or EXT_PRODUCT_CATALINA_W
Extract names are always in uppercase.
PEX-3 Frequency is optional. An extract may over-write a previous run and just contain the data at a point in time. In this case the <frequency> is not required.
Where a frequency is required it must follow the same naming and Foreign Key standards as an Aggregate.
PEX-4 Attributes can be renamed within Extracts and follow a different naming convention. Attributes can be renamed from the core Presentation Layer equivalents to suit the target audience.
Some extracts may have a fixed naming standard by their nature.
PEX-5 Extracts can be enriched with additional Dimensional values. Additional Dimension attributes can be added to Extracts if needed.
PEX-6 Extra meta-data will be added for each Extract to enable a Framework driven approach to be developed, to improve reporting and make clear the lineage to the Integration Layer. The following meta-data will be created and populated for every Dimension:-
  • L5 Subject Area – the subject area within the Presentation Layer that this object is mastered
  • L5 Integration Layer Source Master – the Integration Layer master for this object
  • Solution Model Layer – set to Presentation
  • Dimensional Type
    • Extract
PEX-7 Extra meta-data will be added for each Extract Column to enable a Framework driven approach to be developed, to improve reporting and make clear the lineage to the Integration Layer. The following meta-data will be created and populated for every Dimension Column:-
  • Presentation Layer Column Type
    • Primary Key
    • Measure Additive
    • Measure Non-Additive

Sample Extract Definition
EXT_STORE_NOTIFICATION_FEED_M

Attribute Data Type Mand? Description
STORE_CD VARCHAR Y This will be primary key of the Dimension represented in the Extract.
First part of the Primary Key.
MONTH_CD VARCHAR Y Frequency of the Extract.
Second part of the Primary Key.
First Attribute, e.g. Duration_In_Progress Various N Always null cardinality. This is an example of a Measure – Additive attribute.
Second Attribute, e.g. Completion_Date Various N Always null cardinality. This is an example of a Measure – Non-Additive attribute.
...
BATCH_ID VARCHAR N Unique process value to identify the job instance (e.g. Ab Initio ETL, Snowpipe, Python process)
JOB_NAME VARCHAR N The name of the [sub] process that inserted this record.
LOAD_TS DATETIME N System Load DateTime for this row.

PII data in Presentation Layer

Below are revised list of standards for PII data in Presentation Layer. It is no longer required to add a prefix/suffix as ‘PII’ to the database object names or columns.
For deprecated standards for PII data, please refer to earlier version of ADW Presentation Layer Model Standards. The deprecated standards for PII data continue to be applicable for existing PII schemas and objects created within these schemas until decommissioned or moved to other schemas.
Any new objects in the Presentation Layer should be created in accordance with the current PII standards.

Standard Overview Detail
PII-1 Clear text PII data will be held in columns classified as PII. The column with PII data should be classified as PII in MUNIN. It is advised that the Data Classification mentioned in the mapping specifications should be in sync with the Data Classifiation provided in MUNIN.
PII-2 Access to PII columns will be restricted. Access to PII columns in database objects will be restricted using appropriate role permissions using Barbossa. The Barbossa roles should be created/configured with appropriate policies to provide access to database objects containing PII data.



Appendix