ADW Staging Layer Conventions

urn:js:virtue:aspire:standard:30.1

TL;DR

Definition

The ADW_STAGE schema is the repository for any data loaded into ASPIRE in it’s raw format. This schema can also be considered as the ASPIRE data lake.

For semi-structured data, the stage table should contain a VARIANT type column to store records. The data can be in json/parquet format generated from streaming events/messages.
For structured data sets, stage table should contain multiple columns same as the data feeds(instead of storing the whole data feed in single column). E.g csv file.
There should not be any transformation done while loading the stage table from S3/Curated Layer/Integration Layer/Data Loaded through Snowflake Native Kafka Connector. It should be one to one with the source.
However for ETL processing there can be few technical columns added to the staging table e.g file name. See the section on Technical columns for the definition of which are applicable. https://github.com/sainsburys-tech/aspire-virtues/blob/main/standard/data%20storage/data%20architecture%20management/Technical%20Column%20Standards/README.md
Snowflake stage table DDL's will be mastered in Github, while the stage table models will be mastered in WhereScape in order to generate the RDV models.
Stage table column names need to align to their source file equivalents.
Stage table data may be immutable, as discussed and agreed with Engineering Leads and Data Archiects.
Stage tables containing semi-structured data will be unpacked and flattened out within the Transform Schema logic views for downstream pipeline ingestion.
See Transform Schema Standards: https://github.com/sainsburys-tech/aspire-virtues/tree/main/standard/data%20storage/data%20architecture%20management/Transform%20Schema
Stage tables containing semi-structured data can be unpacked and flattened using a view within ADW_STAGE - This is purely for analysis purposes and cannot be used by downstream pipeline ingestion.
Stage views are not required for staging tables that contain only structured data. </ul> ## Staging Table Naming Convention For stage tables not populated by SIESTA <Operational System Code (Up to 16 characters)\>\_<Source System Code (6 characters)> E.g.: GOL_GOLNRD Note:
For Operational System codes, refer to the OPERATIONAL_SYSTEM_CD column of the ADW_METADATA_PL.OPERATIONAL_SYSTEM table which contains all valid operational system codes.
For Source System codes, refer to the SOURCE_SYSTEM_CD column of the ADW_METADATA_PL.SOURCE_SYSTEM table which contains all valid source system codes. Naming convention has been aligned to accomodate HERA. <https://sainsburys-confluence.valiantys.net/display/HERA/Step+2+-+Patterns+and+Standards> ## Staging Table Naming Convention for SIESTA For stage tables populated by SIESTA: SIESTA_<IRM L1 AREA\>\_<Source System Code (6 characters)>_<Context Description (Up to 12 characters)>_STAGE E.g.: SIESTA_PRODUCT_GOLNRD_NUTRITION_STAGE
Note: SIESTA staging name length is limited 60 characters as per the SIESTA capability standards ## Staging View Naming Convention (Optional) Note: This convention only applies for un-packing views created in ADW_STAGE for analysis purposes only. For stage views pointing to SIESTA stage tables: SIESTA_<IRM L1 AREA\>\_<Source System Code (6 characters)>_<Context Description (Up to 12 characters)>_STAGE_SDVW E.g.: SIESTA_PRODUCT_GOLNRD_NUTRITION_STAGE_SDVW For stage views pointing to Non-SIESTA stage tables: <Operational System Code (Up to 16 characters)\>\_<Source System Code (6 characters)>_SDVW E.g.: GOL_GOLNRD_SDVW Note: SDVW identifies an unpacking view, i.e. Structured Data View
For Operational System codes, refer to the OPERATIONAL_SYSTEM_CD column of the ADW_METADATA_PL.OPERATIONAL_SYSTEM table which contains all valid operational system codes.
For Source System codes, refer to the SOURCE_SYSTEM_CD column of the ADW_METADATA_PL.SOURCE_SYSTEM table which contains all valid source system codes.