- •Contents at a glance
- •Contents
- •Introduction
- •Who this book is for
- •Assumptions about you
- •Organization of this book
- •Conventions
- •About the companion content
- •Acknowledgments
- •Errata and book support
- •We want to hear from you
- •Stay in touch
- •Chapter 1. Introduction to data modeling
- •Working with a single table
- •Introducing the data model
- •Introducing star schemas
- •Understanding the importance of naming objects
- •Conclusions
- •Chapter 2. Using header/detail tables
- •Introducing header/detail
- •Aggregating values from the header
- •Flattening header/detail
- •Conclusions
- •Chapter 3. Using multiple fact tables
- •Using denormalized fact tables
- •Filtering across dimensions
- •Understanding model ambiguity
- •Using orders and invoices
- •Calculating the total invoiced for the customer
- •Calculating the number of invoices that include the given order of the given customer
- •Calculating the amount of the order, if invoiced
- •Conclusions
- •Chapter 4. Working with date and time
- •Creating a date dimension
- •Understanding automatic time dimensions
- •Automatic time grouping in Excel
- •Automatic time grouping in Power BI Desktop
- •Using multiple date dimensions
- •Handling date and time
- •Time-intelligence calculations
- •Handling fiscal calendars
- •Computing with working days
- •Working days in a single country or region
- •Working with multiple countries or regions
- •Handling special periods of the year
- •Using non-overlapping periods
- •Periods relative to today
- •Using overlapping periods
- •Working with weekly calendars
- •Conclusions
- •Chapter 5. Tracking historical attributes
- •Introducing slowly changing dimensions
- •Using slowly changing dimensions
- •Loading slowly changing dimensions
- •Fixing granularity in the dimension
- •Fixing granularity in the fact table
- •Rapidly changing dimensions
- •Choosing the right modeling technique
- •Conclusions
- •Chapter 6. Using snapshots
- •Using data that you cannot aggregate over time
- •Aggregating snapshots
- •Understanding derived snapshots
- •Understanding the transition matrix
- •Conclusions
- •Chapter 7. Analyzing date and time intervals
- •Introduction to temporal data
- •Aggregating with simple intervals
- •Intervals crossing dates
- •Modeling working shifts and time shifting
- •Analyzing active events
- •Mixing different durations
- •Conclusions
- •Chapter 8. Many-to-many relationships
- •Introducing many-to-many relationships
- •Understanding the bidirectional pattern
- •Understanding non-additivity
- •Cascading many-to-many
- •Temporal many-to-many
- •Reallocating factors and percentages
- •Materializing many-to-many
- •Using the fact tables as a bridge
- •Performance considerations
- •Conclusions
- •Chapter 9. Working with different granularity
- •Introduction to granularity
- •Relationships at different granularity
- •Analyzing budget data
- •Using DAX code to move filters
- •Filtering through relationships
- •Hiding values at the wrong granularity
- •Allocating values at a higher granularity
- •Conclusions
- •Chapter 10. Segmentation data models
- •Computing multiple-column relationships
- •Computing static segmentation
- •Using dynamic segmentation
- •Understanding the power of calculated columns: ABC analysis
- •Conclusions
- •Chapter 11. Working with multiple currencies
- •Understanding different scenarios
- •Multiple source currencies, single reporting currency
- •Single source currency, multiple reporting currencies
- •Multiple source currencies, multiple reporting currencies
- •Conclusions
- •Appendix A. Data modeling 101
- •Tables
- •Data types
- •Relationships
- •Filtering and cross-filtering
- •Different types of models
- •Star schema
- •Snowflake schema
- •Models with bridge tables
- •Measures and additivity
- •Additive measures
- •Non-additive measures
- •Semi-additive measures
- •Index
- •Code Snippets
FIGURE 5-27 With a proper dimension, you can easily slice by age range.
You can obtain a good data model by separating the rapidly changing attribute from the original dimension and storing it as a value in the fact table or, if needed, building a proper dimension on top of the attribute. The resulting loading process is much easier—and the data model is much simpler—than with a fully featured SCD.
Choosing the right modeling technique
In this chapter, we have shown two different methods for handling changing dimensions. The canonical way is to create a fully featured SCD with a rather complex loading process. The simpler way is to store the slowly changing attribute as a column in the fact table, and, if needed, to build a proper dimension on top of the attribute.
The latter solution is much simpler to develop, so sometimes it will be the best way to handle SCDs, especially if you can easily isolate one slowly changing attribute. However, if the number of attributes is larger, you might end up having too many dimensions, making the data model difficult to browse. As often happens in data modeling, you should always think carefully before choosing one solution over the other. For example, if you want to track, for the customer, several historical attributes like age, full address (country/region, state, and continent), country or region sales manager, and possibly other attributes, you can end up building many dimensions for the sole purpose of tracking all those attributes. On the other hand, no matter how many changing attributes you have in a dimension, if you go for the fully featured SCD, then you will have to maintain only a single dimension.
Let us go back to the example used throughout this chapter: the handling of the current and historical sales manager. If, instead of focusing on the dimension, you focus on the attribute alone, you can easily solve the scenario by using the model
shown in Figure 5-28.
FIGURE 5-28 Denormalizing the historical manager in the fact table leads to a simple model.
Building the model is straightforward. You only need to compute, for each sale, the sales manager assigned to the customer’s country or region at the time of the sale. You can obtain this with a couple of merge operations—and, most importantly, without having to update the granularity of either the fact table or the dimension.
Regarding SCDs, here is a simple rule of thumb: If possible, try to isolate the slowly changing attribute (or set of attributes) and build a separate dimension for those attributes. You do not need to update the granularity. If the number of attributes is too large, then the best option is to go for the much more complex process of building a full SCD.
Conclusions
SCDs are not easy to manage. Yet, in many cases, it is important to use them because you want to track what happened in a relationship and attempt to predict what might happen in the future. The following are the important points to
remember from this chapter:
What changes is not the dimension. It is a set of attributes of a dimension. Thus, the proper way of expressing the changing nature of your data is to understand what the slowly changing attributes are.
You use historical attributes when analyzing the past. You use current attributes when projecting the current data to forecast the future.
If you have a small set of slowly changing attributes, you can safely denormalize them in the fact table. If a dimension is needed for those attributes, you can build a historical dimension as a separate one.
If the number of attributes is too large, you must follow the SCD pattern, knowing that the loading process will be much more complex and errorprone.
If you build an SCD, you must move the granularity of both the fact table and the dimension to the version of the entity instead of the original entity.
When you manage SCDs, most of the counting calculations must be adjusted to handle the new granularity, typically by using a distinct count instead of simple counts.