Link Search Menu Expand Document

ADW Integration Layer Model Standards

urn:js:virtue:aspire:standard:18.1

TL;DR

The standards to be followed by Data Architects when creating the ADW Integration Layer Model.

Definition

The standards for the ADW Integration Layer Data Model are detailed below. The naming convention for each standard follows this approach:

Example, IGE-1 means

  • I denotes the Integration Layer Solution/Physical Data Model
  • GE denotes the General section
  • 1 is the number of the standard

The shortcut used is clearly indicated in the section names below.

General (GE)

Standard Overview Detail
IGE-1 The Integration Layer will be built using the Data Vault 2.0 modelling methodology. The Integration Layer will consist of these areas:-

Raw Data Vault – Source-independent data loaded from multiple sources and stored in a common format.

Business Data Vault – an extension of a Raw Data Vault that applies selected business rules, denormalisations, calculations, and other query assistance functions in order to facilitate user access and reporting. Business Data Vault tables should be refreshed once their dependent Raw Data Vault tables are refreshed.

Additionally, the integration layer also has these minor sub-layers (schemas):

  • Staging - for ingesting data into snowflake through an ingestion pipe
  • Temp - for temporary and transient ETL tables
  • Transform - views and tables created to simplify the logic of Presentation Layer objects - the BDV transformation logic may also be included in this area
See the chapters Staging, Temp and Transform at the end of the document.
IGE-2 A combination of the Entity types from Data Vault 2.0 will be utilised. In the Raw Data Vault, the following Entity Types will be implemented:-
  1. Hub
  2. Link
  3. Satellite
  4. Reference.
In the Business Data Vault, the following Entity Types will be implemented:-
  1. Point in Time
  2. Bridge
  3. Reporting
See the detailed sections below for the standards for each Entity Type.
IGE-3 Unique keys will be referred to as Natural Keys. The term Natural Key will be used, and Business Key (although equivalent) will not be used in any models.
IGE-4 To enable lineage to the LDM extra meta-data will be added for each object created. The following meta-data will be created and populated for every object created:-
  1. L5 Subject Area – to enable object to be reported on by each subject area
  2. LDM Entity Source Name – to enable lineage back to the LDM
  3. Solution Model Layer – set to Integration or Presentation
  4. Data Vault Type – to better report on the different purposes of the objects in the Integration Layer
    1. Hub
    2. Satellite
    3. Link
    4. Reference
    5. Point in Time
    6. Bridge
    7. Reporting.
IGE-5 Time portion of Date Time’s is optional if not applicable. A time portion of a Date Time is to be set to 00:00:00 if no time is specified.
IGE-6 A BDV object can reference a PL object It is acceptable to access PL objects from BDV when creating other objects. The aim should be to make such references as simple and transparent as possible.

When creating a new BDV objects, the PL object select statements should not involve complex logic on the attributes selected.

Eg: Dimension look up is allowed

If however, complex logic does need to be performed on the selected attributes from the PL object then it is suggested that a new BDV object is created that contains transformed data from the PL object. This BDV table (plus any other sources of data) can then be used to create the new PL/BDV object.

Eg: Using a PL metric to derive another metric in BDV is not allowed.
IGE-7 Publication of date and/or time columns In the RDV:
  • If date only is supplied only a date (_DT) column need be included.
  • If time only is supplied (e.g. store opening time) only a time (_TM) column need be included
  • If date and time are supplied as separate columns publish:
    1. Date (_DT) and time (_TM) columns as supplied
    2. A UTC timestamp (_UTC_TS). Use Snowflake TIMESTAMP_NTZ data type.
  • If date and time are supplied as one column publish:
    1. A timestamp (_TS) column to contain the format as supplied. Use Snowflake TIMESTAMP_NTZ data type if no time zone offest is supplied or use TIMESTAMP_TZ data type if it is.
    2. A UTC timestamp (_UTC_TS). Use Snowflake TIMESTAMP_NTZ data type.
    3. In the case where the supplied timestamp is always UTC then only the UTC_TS column need be published.
    4. In the case where the supplied timestamp is always UTC then only the UTC_TS column need be published.
</ul> Note: If an 'epoch' column is supplied (e.g. seconds relative to 1970) this be converted to a timestamp column. If felt appropriate a separate EPOCH_TS_NUM column.

In the BDV, follow the same standards as for the PL (see PGE-10).



Hubs (HU)

Standard Overview Detail
IHU-1 Hubs are used to record the major Entities that underpin the business, the ‘things’ the business owns and stores additional data about. Every row in a Hub is a unique Natural Key for the Entity.

This Natural Key(s) may come from more than one source system. If the same Natural Key is loaded from +1 system then duplicated are not created but the row is re-used.

Reference data about the Hub is stored in linked Satellites.
IHU-2 Hubs don’t have to have Satellites. This is known as a Stub-Hub. A Stub-Hub is simply a list of unique Natural Keys. It is a place-holder for future expansion when reference data about the Entity is available to be modelled. Once this reference data is available then Satellites will be added and the Stub-Hub will become a Hub.

Stub-Hub is not a sub-type of a Hub, it is an application at a point in time of the Hub.
IHU-3 Hubs will be identified by a Natural Key. Every row in the Hub will contain a Natural Key used to identify the entity in the source systems.
IHU-4 Multi-part Natural Keys will be supported (i.e. Compound Keys). It is possible for a Natural Key from a source system to be composed of a Compound Key over more than one column.
IHU-5 The source system of the data in the Hub will be stored. The source system for the Natural Key will be stored in the Hub.

Where more than one source system supply the same natural key, the source system on the Hub will be the first system to load data into the Hub.
IHU-6 Naming of Hubs. The naming convention of a Hub is

<Entity>_HUB

For example
STORE_HUB
PROMOTION_HUB
IHU-7 Length of Hub name. All Hubs must be 30 characters or less.[Deprecated]
IHU-8 Abbreviations Hub names will not be abbreviated unless they exceed 30 characters.[Deprecated]
IHU-9 Structure of the Hub. Detailed below are the standard columns and naming convention that must be present for each Hub.

No other attributes are to be held on the Hub Entities.

Abbreviations

In the physical model, the Natural Key columns of the Hub will not be abbreviated.

The name of the Natural Key columns must align with the name of the Hub, for example on NECTAR_ACCOUNT_HUB the Natural Keys will be called NECTAR_ACCOUNT_NK1, NECTAR_ACCOUNT_NK2 and NECTAR_ACCOUNT_NK3.
IHU-10 Sub-types of Major LDM Entities will be physicalized as specific Hubs. This will allow the mapping from raw input feeds to be modelled and build more closely.



Sample Hub Definition
STORE_HUB

Attribute Data Type Mand? Description
STORE_NK1 VARCHAR Y Natural Key from the source system.

First part will be mandatory. The other two will be optional (because they are part of the Primary Key they will be loaded with a default value).

Part of the Primary Key.
STORE_NK2 VARCHAR Y Second Natural Key value.

Part of the Primary Key.
STORE_NK3 VARCHAR Y Third Natural Key value.

Part of the Primary Key.
BATCH_ID VARCHAR N Unique process value to identify the job instance (e.g. Ab Initio ETL, Snowpipe, Python process)
JOB_NAME VARCHAR N The name of the [sub] process that inserted this record.
LOAD_TS DATETIME Y System date when the record was inserted. Not part of any data feed values.
SOURCE_SYSTEM_CD VARCHAR Y 6 character string to uniquely identify the source system that supplied the row of data in the Hub.

Note, this is the column order, no additional columns can be added to a Hub.

Satellites (SA)

Standard Overview Detail
ISA-1 Satellites hold data about Hubs that may change over time. A delta driven record of descriptive information that can be accessed either for the current time or a time period in the past.

No history is stored in the Hub, all of the History is stored in the Satellites.
ISA-2 Satellites will be split by source system. Because of the nature of the Integration Layer, each distinct source system will have a separate Satellite.

This allows rapid load of data into the Integration Layer. At a later date if the business rules change then an easier choice can be made about an alternative data source than loading the data again.
ISA-3 Every row in a Satellite will contain a date and time of when the data was loaded. System Load Dates will be mandatory.

For example on 1-Feb 13:03 a record was loaded for a Satellite, this is the date and time that the data change was first notified.
ISA-4 History will be maintain via an insert policy, i.e. only a System Load DateTime will be maintained as part of the primary key. This ensures when a new piece of data we want to track for history change is available the model does not need to have two changes made, one to close off the existing entry and one to create the new entry.
ISA-5 Only changes will be written to Satellites. On a day by day basis, if a row is presented that contains no changes from the latest row in the Satellite, then no additional rows are created.

The System Load DateTime in a Satellite indicates when a change arrived from source.
ISA-6 The Primary Key of a Satellite will be a two part key. Satellite Primary Keys consist of:-
  1. Natural Key(s) of the Hub
  2. System Load DateTime
ISA-7 Naming of Satellites. The naming convention of a Satellite is
<Entity>_<Context>_<Source System>_SAT
<Context> is optional.
  • This will be only applicable when there is a need to create multple satellite tables for a hub for the same source system code.
  • When added Context becomes an extension to the Entity name.
  • Typical use case will be in a same source file we have data in different grains.
  • e.g. Customer data that has phone numbers and addresses nested in semi structured format.In this case having the context allows to create multiple satellite tables for the same hub and source system code.
    Assuming SCVCUS as source system code.Multiple satellite tables as following:CUSTOMER_SCVCUS_SAT, CUSTOMER_ADDRESS_SCVCUS_SAT,CUSTOMER_PHONE_SCVCUS_SAT

For example
STORE_R09ITM_SAT
PROMOTION_DPPSTR_SAT
CUSTOMER_ADDRESS_SCVCUS_SAT
The SOURCE_SYSTEM_CD will exist on each row in a Satellite, and the same value will exist in the Satellite name <Source System> above.
ISA-8 Length of Satellite name. All Satellites must be 30 characters or less.[Deprecated]
ISA-9 Abbreviations Satellite names will not be abbreviated unless they exceed 30 characters.[Deprecated]
ISA-10 Structure of the Satellite. Detailed below are the standard columns and naming convention that must be present for each Satellite.
Abbreviations
The Natural Keys of a Satellite are inherited from the Hub. The Natural Keys cannot be changed or abbreviated.
Attribute columns of the Satellites will be abbreviated in line with the abbreviations mastered in the L5 Logical model.
ISA-11 Business effective dates are attributes within the Satellite. The Key for the Satellite is the System Load Date.
If an attribute within the Satellite is business date sensitive, for example the date that a customer status changed, then this will have to be supplied by the source system and stored as an attribute on the Satellite.
ISA-12 Only by exception will Satellites with non-standard Primary Keys be created. In the Data Vault 2.0 standards such targets are known as Multi-Active Satellite

“A Multi-Active Satellite is a structure built to house multiple active child records, where these child records have no stand-alone business key of their own. A prime example would be an employee with multiple active addresses at the same time.“

Sample Satellite Definition

STORE_RMS9_SAT

Attribute Data Type Mand? Description
STORE_NK1 VARCHAR Y Natural Key from the source system.

First part will be mandatory. The other two will be optional (because they are part of the Primary Key they will be loaded with a default value).

Part of the Primary Key.
STORE_NK2 VARCHAR Y Second Natural Key value.

Part of the Primary Key.
STORE_NK3 VARCHAR Y Third Natural Key value.

Part of the Primary Key.
LOAD_TS DATETIME Y System Load Date and Time for this row, i.e. the time it was discovered and loaded into the Integration Layer.
Final part of the Primary Key.
BATCH_ID VARCHAR N Unique process value to identify the job instance (e.g. Ab Initio ETL, Snowpipe, Python process)
JOB_NAME VARCHAR N The name of the [sub] process that inserted this record.
RECORD_DELETED_FLAG CHAR N Used to denote whether the Natural Key is no longer being supplied by the source system.
HASH_DIFF NUMBER(38,0) N A hash key based on the non key and non-technical attributes, i.e. the columns from the source systems. It may be used by the code frameworks for determining whether a change has occurred in the source record compared to the current record, i.e. used for CDC processing.
RECORD_ID VARCHAR N A UUID/GUID to uniquely identify each row of data.
SOURCE_SYSTEM_CD VARCHAR Y 6 character string to uniquely identify the source system that supplied the row of data in the Hub.
DQ_AUDIT VARCHAR N JSON message of any error/warnings on this row of data from the DQ audit rules within the Curated Data Lake layer.
First column from source system Various N Always null cardinality.
Second column from source system Various N Always null cardinality.
Etc… Various N Always null cardinality.



Standard Overview Detail
ILK-1 Links hold relationships between two or more Natural Keys at a point in time. Links are typically transactions within the business between two or more Entities, for example Online Purchase, Product Promotion or New Manager for a Store.
ILK-2 Links always have a time or datetime element. Links never have effective from and to date ranges, they record the instance of an interaction.
Links are never Hubs.
Links are inherently associative in nature.
ILK-3 Links can store relationships between Hubs of the same type. Two or more Natural Keys belonging to the same Hub can have a relationship.
Common examples are Parent/Child relationships, or ‘same-as’ which links two different natural keys within the same Hub together.
These will always be named as <HUB>_RELATED_LINK.
ILK-4 The Hub Natural Key will be denormalised into the Link. The Natural Key from the Hubs will cannot be renamed or abbreviated.
ILK-5 Every Link needs to have a Reason for the relationship. The reason will describe why more than one natural keys (Hubs) are being joined.
For example
  1. Customer accepts a Promotion
  2. Employee becomes the Store Manager
  3. Order is Fulfilled
Having a Reason Code allows many-to-many relationships to be handled by the same Link, for example take a Customer and an Order, in the CUSTOMER_ORDER_LINK target there could be reasons of…
  1. Customer Placed Order (1-Feb)
  2. Customer Cancelled Order (2-Feb)
NB
REASON_CD – Has two modes of usage

USE CASE 1 – The REASON_CD can be the same in Link tables created for data from the same business process or event to uniquely identify the reason why hubs from the same business process or event are being joined at this time

USE CASE 2 – The REASON_CD can be different for Link tables created for data from the same business process or event when required to provide context and filter records.
ILK-6 The Primary Key of a Link will be a Compound Key. Link Primary Keys consist of:-
  1. Natural Key of the Hub #1
  2. Natural Key of the Hub #2
  3. Natural Key of the Hub #x (optional)
  4. System Load DateTime
  5. Source System Code
  6. Reason Code
  7. {Optional Degenerate Field}
In some cases a degenerate field will be added to the Primary Key in case the Hub Primary Keys are not enough to make the row unique. For example a sequence number on an invoice, the page number of a book or the sequence number of a complex message may be required.
ILK-7 Naming of Links. The naming convention of a Link is
<Hub#1>_<Hub#2>_<Hub#x>_LINK
For example
STORE_EMPLOYEE_LINK
STORE_RELATED_LINK
PROMOTION_PRODUCT_LINK

Alternatively, in the case of a transactional Link, or where the number of Hubs make the name resulting from <Hub#1>_<Hub#2>_<Hub#x>_LINK unwieldy, a Link can be named according to the subject matter.  For example: PURCHASE_ORDER_LINK.
ILK-8 Length of Satellite name. All Links must be 30 characters or less.[Deprecated]
ILK-9 Abbreviations Link names will not be abbreviated unless they exceed 30 characters.[Deprecated]
ILK-10 Structure of the Link. Detailed below are the standard columns and naming convention that must be present for each Link.
Abbreviations
The Natural Keys of a Link are inherited from the Hub. The Natural Keys cannot be changed or abbreviated.
Measure columns of the Links will be abbreviated in line with the abbreviations mastered in the L5 Logical model.
ILK-11 Links don’t have a Hash Key. Due to the cost of calculating a Hash Key for high volume transactional data, a unique Hash Key will not be generated for a Link.
In the Data Vault 2.0 standards it is a rule to create Hash Keys for Links, this is to allow a Satellite to be connected to a Link. However our standard (below) will add data that would have gone onto a Satellite directly onto the Link.
ILK-12 A Link can contain additional data about the event. Additional fields from the IGE-8 source feed can be added as attributes directly to the Link. This will avoid creating a second Satellite for each Link and will reduce high volume joins at run-time.
ILK-13 Duplicates will be loaded Another use of the Hash Key on Link tables is to ensure duplicates are not loaded into the Integration Layer.

We will make use of the physical process/batch load meta-data to ensure duplicate data runs are not processed.
ILK-14 Links can be denormalised if the volumetrics suit this pattern of model. On a case by case basis large transactional events, for example place an order, web traffic history, sales basket content can be written in a denormalised manner for performance optimisation.

These are known as Transactional-Links and one difference is that they contain a Transaction Date and/or Time as part of the Primary Key. This means the Load Datetime can be made a Measure of the Link rather than part of the Primary Key.
Attribute Data Type Mand? Description
STORE_NK1 VARCHAR Y This will be primary key of the Hub and is a Foreign Key in this Link.
First part of the Link Primary Key.
STORE_NK2 VARCHAR Y This will be primary key of the Hub and is a Foreign Key in this Link.
First part of the Link Primary Key.
STORE_NK3 VARCHAR Y This will be primary key of the Hub and is a Foreign Key in this Link.
First part of the Link Primary Key.
EMPLOYEE_NK1 VARCHAR Y This will be primary key of the Hub and is a Foreign Key in this Link.
Second part of the Link Primary Key.
EMPLOYEE_NK2 VARCHAR Y This will be primary key of the Hub and is a Foreign Key in this Link.
Second part of the Link Primary Key.
EMPLOYEE_NK3 VARCHAR Y This will be primary key of the Hub and is a Foreign Key in this Link.
Second part of the Link Primary Key.
LOAD_TS DATETIME Y System Load DateTime for this row, i.e. the date it was discovered and loaded into the Integration Layer.
Third part of the Link Primary Key.
REASON_CD VARCHAR Y 6 character string to uniquely identify reason why these Hubs are being joined at this point in time.
Fourth part of the Link Primary Key.
SOURCE_SYSTEM_CD VARCHAR Y 6 character string to uniquely identify the source system that supplied the row of data in the Hub.
Fifth part of the Link Primary Key.
BATCH_ID VARCHAR N Unique process value to identify the job instance (e.g. Ab Initio ETL, Snowpipe, Python process)
JOB_NAME VARCHAR N The name of the [sub] process that inserted this record.
RECORD_ID VARCHAR N A UUID/GUID to uniquely identify each row of data.
DQ_AUDIT VARCHAR N JSON message of any error/warnings on this row of data from the DQ audit rules within the Curated Data Lake layer.
First column from source system Various N Always null cardinality. We would expect the business date [and time] of this transaction to be made available here.
Second column from source system Various N Always null cardinality.
Etc... Various N Always null cardinality.

Reference (RE)

Standard Overview Detail
IRE-1 Naming of References. The naming convention of a Reference object is
<Entity>_REF
For example
SOURCE_SYSTEM_REF
REASON_REF
IRE-2 Structure of the Reference. Detailed below are the standard columns and naming convention that must be present for each Reference object.
  • Primary Key columns
  • Load Datetime
  • Description
  • Any other attributes
  • Technical Columns.
IRE-3 References can either be point in time or historised. The Load Datetime can be used as required.
IRE-4 Reference objects are keyed on Codes and do not require a hash key. Because of the small amount of data a hash key is not required for each row in a Reference object.

Sample Reference Definition

SOURCE_SYSTEM_REF

Attribute Data Type Mand? Description
SOURCE_SYSTEM_CD VARCHAR Y This will be primary key of the Reference and will be used as a Foreign Key in other objects.
First part of the Primary Key.
LOAD_TS DATETIME Y System Load DateTime for this row, i.e. the date it was discovered and loaded into the Integration Layer.
BATCH_ID VARCHAR N Unique process value to identify the job instance (e.g. Ab Initio ETL, Snowpipe, Python process)
JOB_NAME VARCHAR N The name of the [sub] process that inserted this record.
RECORD_ID VARCHAR N A UUID/GUID to uniquely identify each row of data.
SOURCE_SYSTEM_DESC VARCHAR Y Description of the Reference data Code.
DQ_AUDIT VARCHAR N JSON message of any error/warnings on this row of data from the DQ audit rules within the Curated Data Lake layer.
Other attributes… Various N Always null cardinality.
Etc… Various N Always null cardinality.



Point in Time (PT)

Standard Overview Detail
IPT-1 Point in Time tables are used to simplify and speed up Presentation Layer generation. A Point In Time target is a System Driven Satellite loaded with a Hubs’ Natural Key(s), and the surrounding Satellites’ Primary Key values. These PK’s are a full copy of the entire PK housed in the Satellite.
IPT-2 Point in Time tables are physicalized. The tables will be created as physical tables in the Business Data Vault.
IPT-3 Point in Time only contain reference data linked to a Hub. The scope of each Point in Time target is a single Hub and all its associated satellites.
IPT-4 Point in Time must contain the Primary Key of each Satellite. This is achieved by each satellite being a column and the latest LOAD_TS of the satellite at the time that the Point in Time table is created is populated.
IPT-5 A Point in Time target will be created every time at least one of the Satellites has a new record inserted. The DateTime part of the Primary Key of a Point in Time target will contain the Date and Time that one of the Satellites last had a row inserted. PIT tables can support intra-day changes due to the Time element.
IPT-6 Optionally a Point in Time table may be created to represent data changing based on Business Dates stored in the Satellites. The DateTime part of the Primary Key of a Point in Time target can be based on a Business datetime, this will allow a view of the Hub and its Satellite data to be viewable by business dates and not on the day it was loaded.
IPT-7 The Primary Key of a Point in Time. PIT Primary Keys consist of:-
  1. Natural Keys of the Hub
  2. System Load DateTime
IPT-8 Naming of Point in Time. The naming convention of a PIT is
<Entity>_PIT
For example
STORE_PIT
PROMOTION_PIT
IPT-9 Structure of the Point in Time. Detailed below are the standard columns and naming convention that must be present for each PIT:-
  1. Primary Key
  2. Load Datetime for latest row in each Satellite linked to the Hub
  3. Technical Columns.
IPT-10 Stubbing a Satellite. Stubs with a low date are not being created in Satellites. This means that in a PIT table a <null> Load Datetime can exist for some Satellites that contain no data.



Bridge (BR)

Standard Overview Detail
IBR-1 The Bridge Table is a combination of primary keys and natural keys spread across multiple Hubs and Links. They can be thought of as “base level Fact Tables”. They provide a snapshot of key structures and are generally not temporal in nature. That said, because Bridge Tables live within the Business Data Vault, they can also house computed fields, and / or temporality.
IBR-2 Bridge tables are physicalized. The tables will be created as physical tables in the Business Data Vault.
IBR-3 Bridge tables enable virtualisation downstream. The most common use of a Bridge table in the Business Data Vault is to allow one or more Presentation Layer objects to be created as Views.
IBR-4 The grain of the Bridge table matches the expected downstream usage. The grain does not need to match any particular combination of Link tables, but rather the expected downstream target. This is because the primary purpose of a Bridge table is to allow virtualisation of the Presentation Layer target (typically via a View).
IBR-5 The Bridge table can meet two technical purposes:-
  • Denormalise one or more Attributes (from Satellites) and/or Measures (from Links) to allow simpler and quicker access to the attributes required in a Presentation Layer view
  • Write the Primary Keys for two or more Links or Satellites to allow quicker access to the reference data linked to a transaction at the time of the transaction creation.
Columns are denormalised onto the Bridge table as per the requirements stated on the left.
IBR-6 The Primary Key of a Bridge. Bridge Primary Keys consist of:-
  1. Natural Keys of the target
  2. System Load DateTime
IBR-7 Naming of Bridge. The naming convention of a Bridge is
<Entity>_BR
For example
SALES_TRANS_LINE_BR
ITEM_MRCHSE_HIERARCHY_BR
IBR-8 Structure of the Bridge. Detailed below are the standard columns and naming convention that must be present for each Bridge:-
  1. Primary Key
  2. Load Datetime
  3. Attributes
  4. Technical Columns.
IBR-9 Column Naming Conversion Since Bridge table is to support downstream/ PL objects, It is not required to follow NK[1,2,3] naming convention for Natural Keys. Rather it can adhere to PL naming conventions.



Reporting (RP)

Standard Overview Detail
IRP-1 A Reporting table is used to simplify and store calculations that are used downstream. This allows a single version of the truth target to be created in the Business Data Vault.
IRP-2 The Reporting table will contain metrics, KPIs, intermediate calculations, results etc. A variety of calculations can be created in Reporting tables on a case by case basis.
IRP-3 The grain of a Reporting table will be suitable for the downstream use. Reporting tables do not have to mirror the Primary Keys of objects in the Raw Data Vault, for example a rollup to KPI level can occur for a transactional feed.
IRP-4 Reporting tables are physicalized. The tables will be created as physical tables in the Business Data Vault.
IRP-5 Naming of Reporting table. The naming convention of a Reporting table is
<Entity>_REP
For example
SALES_TRANS_REP
FINANCE_KPI_REP
IRP-6 Structure of the Reporting table. Detailed below are the standard columns and naming convention that must be present for each Reporting table:-
  1. Primary Key
  2. Load Datetime
  3. Attributes
  4. Technical Columns.
IRP-7 Column Naming Conversion Since reporting table is to support downstream/ PL objects, It is not required to follow NK[1,2,3] naming convention for Natural Keys. Rather it can adhere to PL naming conventions.



PII Data in Integration Layer

Standard Overview Detail
PII-1 Clear text PII data will be held in columns classified as PII. The column with PII data should be classified as PII_Customer or PII_Colleague as appropriate in MUNIN. It is advised that the Data Classification mentioned in the mapping specifications should be in sync with the Data Classifiation provided in MUNIN.
PII-2 Access to PII columns will be restricted. Access to PII columns in database objects will be restricted using appropriate role permissions using Barbossa. The Barbossa roles should be created/configured with appropriate policies to provide access to database objects containing PII data.
PII-3 Natural Keys are Clear text PII. Hash the PII attribute and store under NK column. Store the Clear Text PII as an additional attribute in the HUB. The attribute need not have a prefix 'PII'.
For example : Current state: For NECTAR_COLLECTOR_CARD_HUB - NECTAR_COLLECTOR_CARD_NK1, PII_NECTAR_COLLECTOR_CARD_NUM
Future state : NECTAR_COLLECTOR_CARD_NK1, NECTAR_COLLECTOR_CARD_NUM - PII prefix can be removed as there is no need to explicitly include the prefix as classification.
PII-4 Natural Keys are non-PII but some of the Attributes are PII PII clear text value can be stored in suitable RDV objects - SAT/LINK.
Column names has to be stored without 'PII' Prefix.

Appendix