Link Search Menu Expand Document

No Data Without Metadata

urn:js:virtue:aspire:principle:1.1

TL;DR

New data feeds will not be accepted for ingestion unless the appropriate metadata is provided.

Rational

New data feeds will not be accepted for ingestion unless the appropriate metadata is provided. In addition, metadata must also be provided for datasets and/or data elements that have been subsequently created and/or derived from the received data. This applies to the full cycle from initial receipt through to provisioning. Note: Appendix 5.1 provides an outline of the various types of metadata that will be required.

Metadata is needed to correctly and efficiently manage the ingestion, storage and subsequent usage of the data. The key drivers are:

  • Data provenance & lineage – The metadata will provide data provenance as well as business and technical lineage of the ingested datasets and data elements.
  • Impact assessment – The lineage will allow timely and cost effective technical assurance for any change projects, impact assessment or any new projects.
  • Data classification – The metadata will facilitate the categorisation of new data elements in business terminology, the identification of synonyms, security and sensitivity classifications as well as further enrichment by attaching business metadata and workflow for data stewardship.
  • Data discovery – Metadata will enable the users to search and explore datasets based upon concepts, variable names and links to similar concepts (semantics etc.) which will be particularly important for data science/exploration.
  • Data governance – A good understanding of the data will expedite the addressing of data quality issues as well as the enforcement of data security and privacy requirements.
  • Process resilience – Data processing can adapt to changes in source data structures.

Implications

The potential implications are:

  • User access – Users will not have access to data unless metadata is provided.
  • Upfront effort – The organisation as a whole will need to invest in the creation and management of metadata.
  • Understand business process – Both the upstream data sourcing team and the downstream developers must understand the relationship between business process and data in order to be able to allow the ingestion of meaningful metadata. This is part of treating data as an asset.
  • Standards & guidelines – Need to define robust standards and guidelines for creating, processing, provisioning and subsequently disseminating metadata.
  • Effective metadata governance – Governance processes and policies need to be in place to ensure that metadata is managed efficiently and effectively.
  • Solutions and processes – Solutions and processes need to be put in place for the effective and efficient collation and management as well as search, discovery and utilisation of metadata.