Ingestion Pattern for Graph Database

urn:js:virtue:aspire:pattern:.

TL;DR

Ingestion patterns for ingestion into graph databases.

Instructions

(With no reporting requirements we recommend Pattern 1. We will be building the history for the data over the time and hence if there are any reporting requirements we can build the PL in the future)

Pattern 1 - Persist the data into Snowflake RDV/BDV Schemas

Load the data from the proposed MVP sources (Snowflake, Mazel, Microstrategy) into RDV without any transformation
Cleansing/DQ or Transformation to be done in BDV
Extract the data from RDV/BDV layer into Raw S3 bucket and push it to Tiger Graph

Pattern 2 - Persist the data into Snowflake RDV/BDV/PL Schemas

Load the data from the proposed MVP sources (Snowflake, Mazel, Microstrategy) into RDV without any transformation
Cleansing/DQ or Transformation to be done in BDV
Make the output available in PL layer in the form of Facts and Dimensions. Currently there are no requirements
Extract the data from PL layer into Raw S3 bucket and push it to Tiger Graph

Appendix

Sample Architecture

alt text

Summary

The Solution Architecture above has been designed for the Data Lineage MVP in accordance with the Principles listed previously. Data Lineage meta-data will be provided by 3 sources (initial MVP only) i.e. Snowflake, Maazel and Microstrategy. Files extracts in different formats are provided to a Raw S3 Bucket. The data is then ingested into Aspire, where the 3 different formats are cleaned and transformed into a consistent dataset and made available in the Snowflake database. Data to support the specific Data Lineage Use Case is then extracted to S3 to be ingested into the Graph DB for analysis and querying. The Graph DB in this Use Case is a managed service from Tiger Graph.