Siesta S3 Bucket Naming Conventions

urn:js:virtue:aspire:proposal:34.1

TL;DR

How and why Siesta S3 Bucket names need their own convention

Rational

Differences from standard ASPIRE S3 Buckets

Siesta buckets are similar to S3 buckets, but differ in their deployment and behaviour. The similarities are:

They are used to store data destined for Aspire
They contain “raw” data that are uploaded by business users, similar to a Raw ASPIRE bucket
They exist across multiple environments - dev, preprod and prod. The key differences include:
The source of data is from User Generated Content rather than a fixed data source
This is data submitted by humans, rather than automated data pipelines
Folder structure within a Siesta Bucket is simplified to accommodate non-technical users and users who do not have context of Data Tech processes.
There are no curated buckets
Access provided to buckets to business users is write-only.
Buckets are provisioned automatically using the Siesta application.
Buckets are hosted in a single account on behalf of other accounts

Bucket Naming convention of ASPIRE-Siesta Data Storage

There only exists a bucket of one type for Siesta, and these follow the following format:

` siesta-{Target AWS Account}-{Environment}-{Logical Component} `

All buckets shall meet the following standards:

Be lowercase
Hyphen delimited
Be prepended with siesta for clear discovery and identification Other principles are broken down into separate sections for ease of understanding:
Target AWS Account
Environment
Logical Component

Target AWS Account

Siesta automatically provisions S3 buckets on behalf of other squads. The AWS Account in the bucket name must not be the host of the Siesta application, but must be the target squad running the bucket.

Typically is the name of the IRM and should coincide with a Fruit Squad’s own naming convention for their AWS account

Environment

Siesta buckets should specify the environment they belong in within their name, for ease of identification. Additionally, buckets must not be hosted in a bucket outside of their own environment. Valid parameters:

dev - Bucket used for development purposes. Files landed here will undergo bleeding-edge processing based on the latest code.
preprd - Bucket used for testing based on the latest stable code. Files landed here will go through pre-approved processed based on the latest approved release. The release shall always match the prd bucket.
prd - Bucket used for submitting actual data. Files will be processed based on the latest stable release.

Logical Component

RAW:

This is the only type of logical component bucket that exists within Siesta
It represents the landing area for users to upload their Excel spreadsheets
There is no “Extract” naming conventions for this

Implications

None.

Appendix

Migrated From Confluence

link Original Author : Chowdhury, Dan