ADW Presentation Layer Model Standards
urn:js:virtue:aspire:standard:20.1
TL;DR
The standards to be followed by Data Architects when creating the ADW Presentation Layer Model.
Definition
The standards for the ADW Presentation Layer Data Model are detailed below. The naming convention for each standard follows this approach:-
Example, PGE-1 means
- P denotes the Presentation Layer Solution/Physical Data Model
- GE denotes the General section
- 1 is the number of the standard.
The shortcut used is clearly indicated in the section names below.
General (GE)
| Standard | Overview | Detail |
|---|---|---|
| PGE-1 | The ADW Presentation Layer model will be driven by physical considerations. | It will be built from the ground up to include:-
|
| PGE-2 | A common set of abbreviations will be used in both the Integration and Presentation Layers. | The abbreviations will be mastered within ERWIN. The abbreviations will apply to non-key columns only. Tables and Views will not be abbreviated unless the length is greater than 30 characters. |
| PGE-3 | Reference data will be represented in Dimensions which on a case by case basis may contain history or not. | Dimensions will be Type 1 or Type 2. |
| PGE-4 | Objects will not have their implementation choice included in their title. | View and tables will not be prefixed with VW_ or TB_ to maintain flexibility in the Presentation Layer |
| PGE-5 | Where an object is implemented as a view the full definition of the view will be mastered. | Within the Data Modelling tool, all columns for the view will be defined and the SQL syntax will be mastered. This SQL syntax may be passed from the Developers for non-trivial cases. |
| PGE-6 | Single version of the truth. | Data derived from a business rule will be mastered in the Business Data Vault area of the Integration Layer. Metrics, KPI’s, segmentations must be derived in a single place to avoid divergence. |
| PGE-7 | Natural keys(numeric in nature) in PL should be considered as ‘Number’ data type | While fetching data from RDV natural key column, if it is numeric in nature, this needs to be casted as ‘NUMBER’ in all PL objects. Currently RDV natural keys are all VARCHAR. If any data scenario exists where in RDV , non-numeric values are being received instead of numeric ,then it has to be the squad decision whether they want to cast it as numeric in PL or not. Reporting team/data users should be made aware of these decisions. Whenever a PL objects make use of BDV objects (ie, _BR or _REP objects) such transformations can be performed in BDV itself. |
| PGE-8 | Suffix ‘_TD’ should be used for any column defining time duration | Any attribute in RDV/BDV/PL fact or dimension should use suffix ‘_TD’ if the definition says it’s a time duration. |
| PGE-9 | A PL object can reference another PL object | It is acceptable to access PL objects for creating another PL object. The aim should be to make such references as simple and transparent as possible. When creating new PL objects, the PL object select statements should not involve complex logic on the attributes selected. Eg: Dimension look up is allowed If however, complex logic does need to be performed on the selected attributes from the PL object then it is suggested that a new BDV object is created that contains transformed data from the PL object. This BDV table (plus any other sources of data) can then be used to create the new PL/BDV object. Eg: Using a PL metric to derive another metric in PL is not allowed. |
| PGE-10 | Publication of date and/or time columns | In the PL publish:
|
Dimensions (DM)
| Standard | Overview | Detail |
|---|---|---|
| PDM-1 | Dimensions will follow a common naming standard. | The standard for a dimension is SCD1 dimension follow below naming convention. Type 1 dimensions can be built on top of Type 2 dimension (if available) View should show the latest status of each primary keys. If an SCD2 view exist in PL, then number of unique primary key in both views should be matching. When a record is deleted from source, RECORD_DELETED_FLAG can be used to show this state. This is an optional field If SCD1 is implemented as a view, LOAD_TS should be present as technical column in the view. If multiple tables are involved for building the view, take the max(LOAD_TS) from those tables. See the section on Technical columns for the definition of which are applicable. All SCDs shall contain default record -2 ‘Not Applicable’. Union the default record with query |
| PDM-2 | The Primary Key will be inherited from the Integration Layer and will follow a common naming standard. | The standard for the Primary Key is <Object_Name>_<Datatype Shortcode>, for example CUSTOMER_CD or DATE_DT. The Primary Key of the Dimension will become the Natural Key from the associated Hub, so for example if the Primary Key of the Hub is a two part Natural Key the Primary Key of the Dimension will also be a two part key. The <Object Name> will mirror the name of the Dimension less the suffix, so for example DIM_STORE_LOYALTY dimension will have a Primary Key of STORE_LOYALTY_CD. The <Datatype Shortcode> will be:-
|
| PDM-3 | Dimensions in the Presentation Layer may or may not contain history. | Type 2 Dimensions will contain the following additional column as part of the Primary Key:- VALID_FROM_DT Also this additional attribute will be added:- VALID_UNTIL_DT There are default values for Low and High dates as follows:- Low Date : 01-01-1900 High Date: 31-12-9999 For Dimensions the lowest level at which history is maintained is currently Day level. Difference between VALID_UNTIL_DT and VALID_FROM_DT of adjacent records of the same primary key shall be 1 day. There shouldn’t be gaps of more than 1 day. When a record is deleted from source, RECORD_DELETED_FLAG can be used to show this state. This is an optional field See the section on Technical columns for the definition of which are applicable. If multiple tables are involved for building the view, take the max(LOAD_TS) from those tables. All SCDs shall contain default record -2 ‘Not Applicable’. Union the default record with query |
| PDM-4 | Dimensions must include the Natural Key | The Natural Key will be the Primary Key of the Dimension. It will be visible to Customers so Natural Key selects and joins can be undertaken. |
| PDM-5 | Snowflaking is permitted within Dimensions with a standard pattern for the columns to include. | When a Dimension contains a Foreign Key to another Dimension this will be permitted. To limit the amount of joins at runtime needed after the _PK column has been added the recommendation is for the following columns to be added to the Dimension too:-
|
| PDM-6 | Technical Columns will exist on all Dimensions. | Technical columns will allow the Operations team to track and monitor data loading into the Presentation Layer. See the section on Technical columns for the definition of which are applicable. |
| PDM-7 | The order of columns will be consistent. | With a Dimension the columns will always be ordered as follows:-
|
| PDM-8 | Dimension mandatory and optional columns. | The following cardinality must be followed:-
|
| PDM-9 | Extra meta-data will be added for each Dimension to enable a Framework driven approach to be developed, to improve reporting and make clear the lineage to the Integration Layer. | The following meta-data will be created and populated for every Dimension:-
|
| PDM-10 | Extra meta-data will be added for each Dimension Column to enable a Framework driven approach to be developed, to improve reporting and make clear the lineage to the Integration Layer. | The following meta-data will be created and populated for every Dimension Column:-
|
| PDM-11 | Default values will be present in all Dimensions. | The following two rows will be present in all dimensions (including the date and time dimensions):-
For eg:- DIM_SAMPLE has a Key1 and Key2 which forms composite key for the dimension then
|
| PDM-23 | Dimensions that will or may include multiple banners need to contain additional columns. | The existing unique identifier (e.g. STORE_CD) remains as is but the values will be modified to create a unique value to differentiate between the brands. Two columns should be added:
|
Sample Dimension Definition
DIM_STORE (Type 1)
| Attribute | Data Type | Mand? | Description |
|---|---|---|---|
| STORE_CD | VARCHAR | Y | This will be primary key of the Dimension and is inherited from the Integration Layer. First part of the Primary Key. |
| First Attribute | Various | N | Always null cardinality. |
| Second Attribute | Various | N | Always null cardinality. |
| ... | |||
| BATCH_ID | VARCHAR | N | Unique process value to identify the job instance (e.g. Ab Initio ETL, Snowpipe, Python process) |
| JOB_NAME | VARCHAR | N | The name of the [sub] process that inserted this record. |
| LOAD_TS | DATETIME | N | System Load DateTime for this row. |
DIM_STORE (Type 2)
| Attribute | Data Type | Mand? | Description |
|---|---|---|---|
| STORE_CD | VARCHAR | Y | This will be primary key of the Dimension and is inherited from the Integration Layer. First part of the Primary Key. |
| VALID_FROM_DT | INTEGER | Y | For SCD Type 2 history tracking. Second part of the Primary Key. |
| VALID_UNTIL_DT | INTEGER | Y | For SCD Type 2 history tracking. |
| First Attribute | Various | N | Always null cardinality. |
| Second Attribute | Various | N | Always null cardinality. |
| ... | |||
| BATCH_ID | VARCHAR | N | Unique process value to identify the job instance (e.g. Ab Initio ETL, Snowpipe, Python process) |
| JOB_NAME | VARCHAR | N | The name of the [sub] process that inserted this record. |
| LOAD_TS | DATETIME | N | System Load DateTime for this row. |
Facts and Fact Less Facts (FA)
| Standard | Overview | Detail |
|---|---|---|
| PFA-1 | Facts will follow a common naming standard. | The standard for a Fact is FACT_<object_name>, for example FACT_ITEM_PURCHASE or FACT_PROMOTION_WITHDRAWL. Fact names are always in uppercase. |
| PFA-2 | Fact Less Facts will follow a common naming standard. | The standard for a Fact is FLF_<object_name>, for example FLF_CUSTOMER_OFFER* Fact Less Fact names are always in uppercase. |
| PFA-3 | Facts and Fact Less Facts must inherit the Primary Keys from the Dimensions. | For each Dimension in the fact ensure the _PK column is present in the Fact. As Dimensions contain history relationships between a Fact and a Dimension require the specification of a date element. |
| PFA-4 | Facts and Fact Less Facts must contain either a Date or Date Timestamp. | This denotes that the FACT or the FLF is an event/atomic level fact – i.e. a business event happened on a date or at a time within a date. When a FACT is at a Date and Time grain the Date and Time must be split into separate columns (Please see PGE-10 for full details) |
| PFA-5 | Facts must contain at least one Measure. | At least one Additive or Non-Additive Measure must exist in a Fact. |
| PFA-6 | Fact Less Facts have no Measures. | The difference between a Fact and a Fact Less Fact is that a Fact Less Fact contains no Additive or Non-Additive Measures. |
| PFA-7 | Technical Columns will exist on all Facts. | Technical columns will allow the Operations team to track and monitor data loading into the Presentation Layer. |
| PFA-8 | The order of columns will be consistent. | With a Fact the columns will always be ordered as follows:-
|
| PFA-9 | Dimension mandatory and optional columns. | The following cardinality must be followed:-
|
| PFA-10 | Extra meta-data will be added for each Fact to enable a Framework driven approach to be developed, to improve reporting and make clear the lineage to the Integration Layer. | The following meta-data will be created and populated for every Dimension:-
|
| PFA-11 | Extra meta-data will be added for each Fact Column to enable a Framework driven approach to be developed, to improve reporting and make clear the lineage to the Integration Layer. | The following meta-data will be created and populated for every Dimension Column:-
|
| PFA-12 | In Facts all Foreign Key Links are mandatory | If a relationship does not exist to a Dimension the following defaults are used:-
|
| PFA-13 | Create an indicator (suffix ‘_IND’) column in Fact for any respective flag (suffix ‘_FLAG’) column which contains values restricted to ‘Y’/’N’ | In the ‘_IND’ column decode ‘Y’ as 1 and ‘N’ as 0. This will enable the reporting to implement summing logic quite easily. If the flag column contains values other than ‘Y’,’N’ (e.g. ‘U’ for unknown, ‘NA’ for Not applicable, ‘DQ’ for Data Quality’ etc.) then the column should not be decoded to any numeric value as we should not assign any random numbers other than 0,1 for these flag values. |
| PFA-14 | Separate natural keys should be deployed for DIM_TIME if a link is required to the DIM_TIME dimension | If there is a requirement for the fact row to be linked to the DIM_TIME dimension using that datetime attribute, an additional column should be added to the fact table defined as a TIME(9) (natural key) and containing only the time element of the datetime attribute. This will form a natural foreign key to the DIM_TIME dimension on column TIME_TM. |
| PFA-15 | Facts that will or may include multiple banners need to contain an additional column. | The existing unique identifier (e.g. STORE_CD) for the dimension FK remains as is but the values will be modified to create a unique value to differentiate between the brands. One column should be added:
|
Sample Fact Definition
FACT_ITEM_PURCHASE
| Attribute | Data Type | Mand? | Description |
|---|---|---|---|
| ITEM_CD | VARCHAR | Y | This will be primary key of the Dimension represented in the Fact. First part of the Primary Key. |
| PROMOTION_ID | NUMBER | Y | This will be primary key of the Dimension represented in the Fact. Second part of the Primary Key. |
| CUSTOMER_CD | VARCHAR | Y | This will be primary key of the Dimension represented in the Fact. Third part of the Primary Key. |
| PURCHASE_DT | INTEGER | Y | Date of the transaction. Fourth part of the Primary Key. |
| PURCHASE_TM | VARCHAR | Y | Time of the transaction. Fifth part of the Primary Key. |
| First Attribute, e.g. Quantity | Various | N | Always null cardinality. This is an example of a Measure – Additive attribute. |
| Second Attribute, e.g. Till Number | Various | N | Always null cardinality. This is an example of a Measure – Non-Additive attribute. |
| ... | |||
| BATCH_ID | VARCHAR | N | Unique process value to identify the job instance (e.g. Ab Initio ETL, Snowpipe, Python process) |
| JOB_NAME | VARCHAR | N | The name of the [sub] process that inserted this record. |
| LOAD_TS | DATETIME | N | System Load DateTime for this row. |
Aggregates (AG)
| Standard | Overview | Detail |
|---|---|---|
| PAG-1 | Aggregates can be rolled up data from Facts, or a daily slice of a combination of Fact and Dimensional data. | There is no separate type of Snapshot object, if required on a daily basis this becomes a Daily Aggregate for example. |
| PAG-2 | Aggregates will follow a common naming standard. | The standard for an Aggregate is AGG_<object_name>_<frequency>, for example AGG_CUST_PURCHASES_D or AGG_PROMO_SUMMARY_M. Aggregate names are always in uppercase. |
| PAG-3 | Frequency supported are Hourly,Daily, Weekly, Monthly, Quarterly and Yearly. | The standard for these <frequency> is:- _H - Hourly _D – Daily _W – Weekly _M – Monthly _Q – Quarterly _Y – Yearly. |
| PAG-4 | Aggregates will have the same standards as Facts but will also include in its Primary Key the link to the reference data about its frequency. | As part of the Primary Key of an Aggregate the following columns must be present:- _H – Hourly – PK of the Hour Dimension _D – Daily – PK of the Date Dimension _W – Weekly – PK of the Week Dimension _M – Monthly – PK of the Month Dimension _Q – Quarterly – PK of the Quarter Dimension _Y – Yearly – PK of the Year Dimension |
| PAG-5 | Technical Columns will exist on all Aggregates. | Technical columns will allow the Operations team to track and monitor data loading into the Presentation Layer. |
| PAG-6 | The order of columns will be consistent. | With a Fact the columns will always be ordered as follows:-
|
| PAG-7 | Dimension mandatory and optional columns. | The following cardinality must be followed:-
|
| PAG-8 | Extra meta-data will be added for each Aggregate to enable a Framework driven approach to be developed, to improve reporting and make clear the lineage to the Integration Layer. | The following meta-data will be created and populated for every Dimension:-
|
| PAG-9 | Extra meta-data will be added for each Aggregate Column to enable a Framework driven approach to be developed, to improve reporting and make clear the lineage to the Integration Layer. | The following meta-data will be created and populated for every Dimension Column:-
|
| PAG-10 | In Aggregates all Foreign Key Links are mandatory. | If a relationship does not exist to a Dimension the following defaults are used:-
|
| PAG-11 | Aggregates are different to Extracts. | An Aggregate contains the Primary Key and one of more Measures. The naming is consistent with other Facts and Dimensions elsewhere in the Presentation Layer. An Extract can alter the names of the columns to suit the target audience (may be external), and for ease of access may contain a mixture of Measures and Reference data for ease of access. |
Sample Aggregate Definition
AGG_STORE_PROMO_STATUS_M
| Attribute | Data Type | Mand? | Description |
|---|---|---|---|
| STORE_CD | VARCHAR | Y | This will be primary key of the Dimension represented in the Aggregate. First part of the Primary Key. |
| MONTH_CD | VARCHAR | Y | Frequency of the Aggregation. Second part of the Primary Key. |
| PROMO_STATUS_CD | VARCHAR | Y | A segmentation for the Measures. Third part of the Primary Key. |
| First Attribute, e.g. Status_Cnt | Various | N | Always null cardinality. This is an example of a Measure – Additive attribute. |
| ... | |||
| BATCH_ID | VARCHAR | N | Unique process value to identify the job instance (e.g. Ab Initio ETL, Snowpipe, Python process) |
| JOB_NAME | VARCHAR | N | The name of the [sub] process that inserted this record. |
| LOAD_TS | DATETIME | N | System Load DateTime for this row. |
Extracts (EX)
| Standard | Overview | Detail |
|---|---|---|
| PEX-1 | Extracts are for external consumers outside of the Presentation Layer. | An Extract can alter the names of the columns to suit the target audience (may be external), and for ease of access may contain a mixture of Measures and Reference data for ease of access. |
| PEX-2 | Extracts will follow a common naming standard. | The standard for an Extract is EXT_<object_name>_<target>_<frequency> Target is optional and only required for an external audience for example EXT_ARI_DATA_D or EXT_PRODUCT_CATALINA_W Extract names are always in uppercase. |
| PEX-3 | Frequency is optional. | An extract may over-write a previous run and just contain the data at a point in time. In this case the <frequency> is not required. Where a frequency is required it must follow the same naming and Foreign Key standards as an Aggregate. |
| PEX-4 | Attributes can be renamed within Extracts and follow a different naming convention. | Attributes can be renamed from the core Presentation Layer equivalents to suit the target audience. Some extracts may have a fixed naming standard by their nature. |
| PEX-5 | Extracts can be enriched with additional Dimensional values. | Additional Dimension attributes can be added to Extracts if needed. |
| PEX-6 | Extra meta-data will be added for each Extract to enable a Framework driven approach to be developed, to improve reporting and make clear the lineage to the Integration Layer. | The following meta-data will be created and populated for every Dimension:-
|
| PEX-7 | Extra meta-data will be added for each Extract Column to enable a Framework driven approach to be developed, to improve reporting and make clear the lineage to the Integration Layer. | The following meta-data will be created and populated for every Dimension Column:-
|
Sample Extract Definition
EXT_STORE_NOTIFICATION_FEED_M
| Attribute | Data Type | Mand? | Description |
|---|---|---|---|
| STORE_CD | VARCHAR | Y | This will be primary key of the Dimension represented in the Extract. First part of the Primary Key. |
| MONTH_CD | VARCHAR | Y | Frequency of the Extract. Second part of the Primary Key. |
| First Attribute, e.g. Duration_In_Progress | Various | N | Always null cardinality. This is an example of a Measure – Additive attribute. |
| Second Attribute, e.g. Completion_Date | Various | N | Always null cardinality. This is an example of a Measure – Non-Additive attribute. |
| ... | |||
| BATCH_ID | VARCHAR | N | Unique process value to identify the job instance (e.g. Ab Initio ETL, Snowpipe, Python process) |
| JOB_NAME | VARCHAR | N | The name of the [sub] process that inserted this record. |
| LOAD_TS | DATETIME | N | System Load DateTime for this row. |
PII data in Presentation Layer
Below are revised list of standards for PII data in Presentation Layer. It is no longer required to add a prefix/suffix as ‘PII’ to the database object names or columns.
For deprecated standards for PII data, please refer to earlier version of ADW Presentation Layer Model Standards. The deprecated standards for PII data continue to be applicable for existing PII schemas and objects created within these schemas until decommissioned or moved to other schemas.
Any new objects in the Presentation Layer should be created in accordance with the current PII standards.
| Standard | Overview | Detail |
|---|---|---|
| PII-1 | Clear text PII data will be held in columns classified as PII. | The column with PII data should be classified as PII in MUNIN. It is advised that the Data Classification mentioned in the mapping specifications should be in sync with the Data Classifiation provided in MUNIN. |
| PII-2 | Access to PII columns will be restricted. | Access to PII columns in database objects will be restricted using appropriate role permissions using Barbossa. The Barbossa roles should be created/configured with appropriate policies to provide access to database objects containing PII data. |