Ingestion Pattern for Graph Database
urn:js:virtue:aspire:pattern:.
TL;DR
Ingestion patterns for ingestion into graph databases.
Instructions
(With no reporting requirements we recommend Pattern 1. We will be building the history for the data over the time and hence if there are any reporting requirements we can build the PL in the future)
Pattern 1 - Persist the data into Snowflake RDV/BDV Schemas
- Load the data from the proposed MVP sources (Snowflake, Mazel, Microstrategy) into RDV without any transformation
- Cleansing/DQ or Transformation to be done in BDV
- Extract the data from RDV/BDV layer into Raw S3 bucket and push it to Tiger Graph
Pattern 2 - Persist the data into Snowflake RDV/BDV/PL Schemas
- Load the data from the proposed MVP sources (Snowflake, Mazel, Microstrategy) into RDV without any transformation
- Cleansing/DQ or Transformation to be done in BDV
- Make the output available in PL layer in the form of Facts and Dimensions. Currently there are no requirements
- Extract the data from PL layer into Raw S3 bucket and push it to Tiger Graph
Appendix
Sample Architecture

Summary
The Solution Architecture above has been designed for the Data Lineage MVP in accordance with the Principles listed previously. Data Lineage meta-data will be provided by 3 sources (initial MVP only) i.e. Snowflake, Maazel and Microstrategy. Files extracts in different formats are provided to a Raw S3 Bucket. The data is then ingested into Aspire, where the 3 different formats are cleaned and transformed into a consistent dataset and made available in the Snowflake database. Data to support the specific Data Lineage Use Case is then extracted to S3 to be ingested into the Graph DB for analysis and querying. The Graph DB in this Use Case is a managed service from Tiger Graph.