Argos Federation

urn:js:virtue:aspire:pattern:.

TL;DR

How to replicate data from the legacy Argos estate to S3.

Instructions

This page provides the pattern to consume data that are replicated from Argos into Amazon S3. This approach uses snowflake transient table as an option to consume and persist in different schemas.

Process Flow

Data from Argos databases are replicated to S3, in form of PII and Non-PII buckets.Apple squad who is doing the work on replicating the relevant datasets into S3 will get the approval of data clinic before putting into S3. Similarly, before consuming the data from S3, each squad should get approval from data clinic.

Data will land in any format like Parquet or CSV or JSON etc. In staging database, which resides in snowflake, transient tables will be created with relevant columns (setting the time travel to 0 )

Please note that data in these tables will not be persisted. Data in these transient table will be in the structure similar to the data replicated from Argos (but only relevant columns) and hence it has to be transformed to a format that Aspire can consume.

Transformation is performed in Legacy Extract (LEGACY_EXT) schema, which will contain views to transient tables in staging. These transformations will help us to create relevant objects which can be persisted in form of either dimensions or tables in Legacy Integration Schema.

We persist data received from Legacy Extract schema in Legacy Integration (LEGACY_INT) schema in form of objects that can be easily merged with Aspire objects or created as a new object in Aspire. Legacy Integration schema plays a vital role in persisting entire data of a given object, which includes historical information as well. This historical information will eventually be copied as relevant objects to RDV so that when strategic approach is implemented the historical data would have already preserved.

Views in presentation layer will be modified to include objects from Legacy Integration schema so that objects will include data from Argos.

Historic Data in RDV

In case of strategic approach, RDV will have to be populated with historical information.

Currently historic information is stored in Legacy Integration Schema (in a form consumed by Aspire) and also in S3 in form of parquet (or any other pre-defined file formats). In case if source systems, which populates RDV, holds historic information, they also will be able to populate RDV tables.

So, historical information from Argos for every business entity in RDV, will be populated by either of these three sources.

Federating Data which is already in Snowflake

In case of migrating data which is already in Snowflake, we do not need to create tables in staging , instead use Legacy Extract schema to get the data and persist in the Legacy Integration Schema. If there are no transformations involved, we can get the data into PL directly from the tables present in snowflake (eg: EDWS). Presentation layer views will now access the data from Federated Schema

Appendix

Migrated From Confluence

link Original Author : Moovendan,Colbert