Dimensional Modeling – Introduction – Azure and AWS Cloud Guide

Dimensional Modeling

Dimensional modeling is a more specialized approach that is optimized for OLAP (online analytical processing) systems. Dimensional models organize data into a star or snowflake schema, with a fact table at the center and several dimension tables surrounding it.

The fact table contains the measures (i.e., numerical data) that are being analyzed, while the dimension tables provide the context (i.e., descriptive data) for the measures.

Also, dimensional modeling is optimized for query performance, making it well suited for OLAP and especially reporting systems. Because data is organized into a star or snowflake schema, it is easier to perform aggregations and analyses, and it is faster to query large datasets. Dimensional models are also easier to understand and maintain, making them more accessible to business users. However, dimensional models can be less flexible and more complex to set up, and they may not perform as well with transactional data.

In conclusion, both tabular and dimensional modeling have their places in data warehousing, and the choice between them depends on the specific needs of your organization. Tabular modeling is more suited to handling large volumes of transactional data, while dimensional modeling is optimized for OLAP systems and faster query performance.

In modern warehousing with data and delta lakes, tabular models structured in facts and dimensions are still effective. There are multiple tools available to balance between extreme normalization and extreme classification. While tabular models provide a simpler structure and facilitate querying of data, dimensional models make it more ready for analytics and reporting needs.

Understanding Dimensional Modeling in Brief

In data warehousing, dimensions, facts, and measures are essential concepts that are used to organize and analyze data. A dimension is a category of data that provides context for a fact, while a fact is a value that describes a specific event or activity. Measures are numerical values that quantify facts.

Dimensions

A dimension is a grouping or category of data that provides context for a fact.

Dimensions can be thought of as the “who, what, when, where, and why” of a dataset.

For example, a sales dataset might include dimensions such as date, product, customer, and location. Each of these dimensions provides additional information about the sales data and helps to contextualize it.

Dimensions can be further classified into the following types:

• Degenerate Dimension: A degenerate dimension is a dimension that is not stored in a separate dimension table but is included in the fact table.

• Conformed Dimension: A conformed dimension is a dimension that is used in multiple fact tables in the same data warehouse. It is designed to ensure the consistency and accuracy of data across the different fact tables. For example, let’s consider a retail company that sells products through multiple channels, such as brick and mortar stores, online stores, and mobile apps. The company has a data warehouse that stores data about sales, inventory, and customer behavior.

In this scenario, the “customer” dimension is a good example of a conformed dimension. The customer dimension contains attributes such as customer name, address, age, gender, and purchase history. This dimension is used in multiple fact tables, such as sales fact table, customer behavior fact table, and inventory fact table.

By using a conformed dimension for customer data, the data warehouse ensures that all the information related to customers is consistent and accurate across all the fact tables. It also simplifies the data model and reduces the risk of data inconsistencies and errors.

Another advantage of using conformed dimensions is that they can be reused across multiple data marts or data domains. This means that the same customer dimension can be used for sales analysis, customer behavior analysis, and inventory management analysis without duplicating the data or creating a separate dimension for each fact table.

• Junk Dimension: A junk dimension is a collection of flags and indicators that are not related to any specific dimension. These flags and indicators are grouped together into a single dimension table to simplify the data model and improve query performance.

Junk dimensions are used when you have many low-cardinality flags that are not related to any specific dimension, and it’s not worthwhile to create a separate dimension for each flag.

The name junk comes from the fact that the dimension contains seemingly unrelated attributes that don’t fit neatly into any other dimension. Examples of attributes that can be included in a junk dimension are as follows:

• Boolean indicators: “yes” or “no” flags that describe the presence or absence of a particular condition

• Flags: “on” or “off” indicators that specify the status of a particular process or workflow

• Codes: short codes that describe the result of a particular event or transaction

• Dates: dates or timestamps that indicate when a particular event occurred

By consolidating these attributes into a single dimension table, you can simplify the data model and improve query performance. The junk dimension table acts as a bridge table between the fact table and the other dimensions in the data model.

For example, let’s consider an e-commerce website that sells products online. The website has a fact table that records the sales transactions and several dimensions, such as product, customer, and time. The fact table contains several low-cardinality flags, such as “shipped,” “cancelled,” “returned,” and “gift-wrapped,” which are not related to any specific dimension. Instead of creating a separate dimension table for each flag, these flags can be consolidated into a junk dimension table. The junk dimension table will have a record for each unique combination of these flags, and the fact table will reference the junk dimension table using a foreign key.

Junk dimensions can be an effective way to simplify a data model and reduce the number of dimension tables required in a data warehouse. However, care should be taken to ensure that

the attributes in the junk dimension are truly unrelated and do not belong in any other dimension. Otherwise, the use of a junk dimension can lead to data-quality issues and analysis errors.

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Azure and AWS Cloud Guide

Dimensional Modeling – Introduction

Roy Egbokhan

Leave a Reply Cancel reply

Related Posts

Objective – Introduction

Traditional Big Data Technologies – Introduction

Data Lakes as Data Warehouses – Introduction

Roy Egbokhan

Leave a Reply Cancel reply