Data Warehouse Architecture — An Overview

A data warehouse is the defacto source of business truth developed by combining data from multiple disparate sources. It supports analytical reporting, and both structured and ad hoc queries. Data warehousing systems, like home designs, have many different architectural options. Some have Operational Data Stores (ODS), others are deployed with data marts. Some data warehouse may reference finite set of source data, or as with most enterprise data warehouses, reference a variety of internal and external data sources. Because constructing a data warehouse is unique to the business use, we will look at the common layers found in all data warehouse architecture. All data warehouse architecture includes the following layers:

  • Data Source Layer
  • Data Staging Layer
  • Data Storage Layer
  • Data Presentation Layer

Data source layer

The data source layer of data warehouse architecture is where original data, collected from a variety internal and external sources, resides in the relational database. Examples of source data types include but are not limited to: These stores disparate data types including:

  • Operational Data — Product data, inventory data, marketing data, or HR data.
  • Social Media Data — Web site hits, content popularity, contact page completion.
  • Third-party data — Demographic data, survey data, census data.

While most data warehouse architecture deals with structured data, consideration should be given to the future use of unstructured data sources, such as voice recordings, scanned images, and unstructured text. These streams of data are valuable silos of information and should be considered when developing your data warehouse.

Data Staging Layer

The data staging layer resides between data sources and the data warehouse. In this layer, data is extracted from different internal and external data sources. Because source data comes in many different formats, the data extraction layer will utilize multiple technologies and tools to extract the required data. Once the extracted has been loaded, it will be subjected to high-level data quality checks. The final result will be clean and organized data that you will load into your data warehouse. The staging layer contains the following components:

  • Landing Database and Staging Area
  • Data Integration Tool (ETL)

Landing Database and Staging Area

The landing database stores the data retrieved from the data source. Staging is used to apply quality checks on the data before moving it to the data warehouse. Staging is an essential step in data warehouse architecture. Poor data will amount to inadequate information and result is poor business decision making. The staging layer s also where you want to make adjustments to the schema to handle unstructured data sources.

Data integration tool

Extract, Transform and Load tools (ETL) are the data integration tools used to extract data from source systems, transform and prepare data and load into the data warehouse. Panoply.io product provides this entire process, easily and quickly. All you need to do is point it to your data source(s).

Data Storage Layer

The data storage layer is where data that was cleansed in the staging area is stored as a single central repository. Depending on your business and your data warehouse architecture requirements, your data storage may be a data warehouse, data mart (data warehouse partially replicated for specific departments), or an Operational Data Store (ODS).

Data Presentation Layer

The presentation layer is where users interact with the cleansed and organized. This layer of the data warehouse architecture provides users with the ability to query the data for product or service insights, analyze the information to conduct hypothetical business scenarios, and develop automated or ad-hoc reports. You may employ an OLAP or reporting tool with a user-friendly Graphical User Interface (GUI) to help users build their queries, perform analysis, or design their reports.

When planning your data warehouse, create one that will handle both structured and unstructured data and is cross-functional. It should also provide a long-term foundation for data provision and decision support.

Published by: Limor Wainstein