Link Search Menu Expand Document

Data Sharing Pattern for External Consumers

urn:js:virtue:aspire:pattern:.

TL;DR

Data sharing pattern for consumers external to JS.

Instructions

These are in order of preference.

Anti Patterns

These are some of the options which are unsafe and should be avoided:

  • FTPs
  • Emails
  • Manual Sharing
  • Copy and Paste
  • Screen Snapshots

Pattern 1 - Snowflake Data Sharing

For Same Region Same Cloud

Assuming the 3rd party with which we want to share data with is also a Snowflake Customer and they are within the same Region and same Cloud Provider

Process Flow

Figure 1

Pros:

  1. No Data Movement
  2. Real Time Access
  3. Consistent Data Across Multiple Consumers
  4. Better Data Governance (Controlled, Customised Views)
  5. Simple to Implement (No extracts to build, no APIs to write, no additional software to install etc)

Cons:

  1. Cannot clone or perform any DML changes on a table that was imported from a share
  2. Warehouses should be used efficiently by the consumers or else it could increase their costs

Futher information: https://docs.snowflake.com/en/user-guide/data-sharing-intro.html

For Different Region or Different Cloud Platform

Assuming the 3rd party with which we want to share data with is a Snowflake Customer in a Different Region or Different Cloud Platform:

Note: Snowflake utilizes database replication to allow data providers to securely share data with data consumers across different regions and cloud platforms. A new account needs to be created in the same region as the 3rd party , data needs to be replicated to the new account to create a share

Process Flow

Figure 2

Pros :

  1. Real Time Access
  2. Consistent Data Across Multiple Consumers as Data providers only need to create one copy of the dataset per region; and not a copy per consumer
  3. Better Data Governance (Controlled, Customised Views)
  4. Simple to Implement (No extracts to build, no APIs to write, no additional software to install etc)

Cons:

  1. Data needs to be Replicated to the same region and cloud provider
  2. Secure data share is not allowed with different regions or cloud platforms when one or more external table exists as part of data share
  3. Entire Database needs to be replicated
  4. Refresh is charged (Credits involved)
  5. Sharing to or from Virtual Private Snowflake (VPS) is currently not supported

Futher information: https://docs.snowflake.com/en/user-guide/secure-data-sharing-across-regions-plaforms.html

3rd party not a Snowflake Customer

Assuming the 3rd party is not a Snowflake customer, a Snowflake Reader Account can be created to share data. A read-only share can be created with charge of compute costs going back to the 3rd party. Threshold limits can be set on the compute warehouse to control the costs if needed.

Process Flow

Figure 3

Pros :

  1. No Data Movement
  2. Real Time Access
  3. Consistent Data Across Multiple Consumers
  4. Better Data Governance (Controlled, Customised Views)
  5. Audit and access logs
  6. Data can be revoked as needed

Cons :

  1. Credits charged to Provider Account (Can restrict the credit usage through Resource Monitor)
  2. Bad queries can lead to additional costs
  3. No DML Operations can be performed on the Reader Account

Futher information: https://docs.snowflake.com/en/user-guide/data-sharing-reader-create.html

Pattern 2 - Rest APIs

Build APIs to fetch the data from Snowflake or S3 having appropriate keys/tokens and associated permissions/roles to pull the data

Pros:

  1. Secured - via SSL/TLS and API keys/tokens
  2. Consistent Data Across Multiple Consumers

Cons :

  1. Lack of central Governance
  2. Overhead of key rotation and management
  3. More development times, ongoing maintenance requirements, and providing support
  4. Mangement of multiple versions of APIs (interface changes)

Pattern 3 - S3 Gateway Endpoint

Consumers can Implement a gateway endpoint to our S3 bucket, we provide role - they pull data over AWS backbone without traversing the internet.

Pros:

  1. Secured - Data will be accessed through the AWS Backbone
  2. Data staged within our environments
  3. Lifecycle policies can be applied to have a access deadline

Cons:

  1. Lack of Central Data Governance
  2. Overhead of lifecycle policies
  3. Storage costs could rise if large objcets are shared for very long periods

Pattern 4 - Presigned URL via central management

We can provide 3rd Party consumers with a pre-signed URL for secure access to S3 objects for a limited time period - they use the URL to pull the data.

Pros:

  1. Consumers can access S3 objects without the need for AWS credentials or IAM permissions
  2. Access and Permission can be controlled by the S3 Bucket Owner

Cons:

  1. Temporary Access: Consumers can’t access the data once the expiry time has lapsed, they must request it again

Pattern 5 - Data sharing via SFTP (Backward compatability for legacy applications)

Process Flow

Figure 4

Pros :

  1. Quick and dirty!
  2. Can be used to link up with legacy technologies

Cons :

  1. Encryption of Data prior to Transit
  2. Data could potentially be sent to wrong server is not handled properly
  3. Key Rotation and Management overhead
  4. Sharing Real Time Data
  5. Lack of Governance and Audit Trail

Appendix