- •Contents at a glance
- •Contents
- •Introduction
- •Who this book is for
- •Assumptions about you
- •Organization of this book
- •Conventions
- •About the companion content
- •Acknowledgments
- •Errata and book support
- •We want to hear from you
- •Stay in touch
- •Chapter 1. Introduction to data modeling
- •Working with a single table
- •Introducing the data model
- •Introducing star schemas
- •Understanding the importance of naming objects
- •Conclusions
- •Chapter 2. Using header/detail tables
- •Introducing header/detail
- •Aggregating values from the header
- •Flattening header/detail
- •Conclusions
- •Chapter 3. Using multiple fact tables
- •Using denormalized fact tables
- •Filtering across dimensions
- •Understanding model ambiguity
- •Using orders and invoices
- •Calculating the total invoiced for the customer
- •Calculating the number of invoices that include the given order of the given customer
- •Calculating the amount of the order, if invoiced
- •Conclusions
- •Chapter 4. Working with date and time
- •Creating a date dimension
- •Understanding automatic time dimensions
- •Automatic time grouping in Excel
- •Automatic time grouping in Power BI Desktop
- •Using multiple date dimensions
- •Handling date and time
- •Time-intelligence calculations
- •Handling fiscal calendars
- •Computing with working days
- •Working days in a single country or region
- •Working with multiple countries or regions
- •Handling special periods of the year
- •Using non-overlapping periods
- •Periods relative to today
- •Using overlapping periods
- •Working with weekly calendars
- •Conclusions
- •Chapter 5. Tracking historical attributes
- •Introducing slowly changing dimensions
- •Using slowly changing dimensions
- •Loading slowly changing dimensions
- •Fixing granularity in the dimension
- •Fixing granularity in the fact table
- •Rapidly changing dimensions
- •Choosing the right modeling technique
- •Conclusions
- •Chapter 6. Using snapshots
- •Using data that you cannot aggregate over time
- •Aggregating snapshots
- •Understanding derived snapshots
- •Understanding the transition matrix
- •Conclusions
- •Chapter 7. Analyzing date and time intervals
- •Introduction to temporal data
- •Aggregating with simple intervals
- •Intervals crossing dates
- •Modeling working shifts and time shifting
- •Analyzing active events
- •Mixing different durations
- •Conclusions
- •Chapter 8. Many-to-many relationships
- •Introducing many-to-many relationships
- •Understanding the bidirectional pattern
- •Understanding non-additivity
- •Cascading many-to-many
- •Temporal many-to-many
- •Reallocating factors and percentages
- •Materializing many-to-many
- •Using the fact tables as a bridge
- •Performance considerations
- •Conclusions
- •Chapter 9. Working with different granularity
- •Introduction to granularity
- •Relationships at different granularity
- •Analyzing budget data
- •Using DAX code to move filters
- •Filtering through relationships
- •Hiding values at the wrong granularity
- •Allocating values at a higher granularity
- •Conclusions
- •Chapter 10. Segmentation data models
- •Computing multiple-column relationships
- •Computing static segmentation
- •Using dynamic segmentation
- •Understanding the power of calculated columns: ABC analysis
- •Conclusions
- •Chapter 11. Working with multiple currencies
- •Understanding different scenarios
- •Multiple source currencies, single reporting currency
- •Single source currency, multiple reporting currencies
- •Multiple source currencies, multiple reporting currencies
- •Conclusions
- •Appendix A. Data modeling 101
- •Tables
- •Data types
- •Relationships
- •Filtering and cross-filtering
- •Different types of models
- •Star schema
- •Snowflake schema
- •Models with bridge tables
- •Measures and additivity
- •Additive measures
- •Non-additive measures
- •Semi-additive measures
- •Index
- •Code Snippets
will not show the purchases of products sold. Instead, it will show the purchases of any product made on the dates where any of the selected products were sold, becoming even less intuitive. Bidirectional filtering is a powerful feature, but it is not an option in this case because you want finer control over the way the filtering happens.
The key to solve this scenario is to understand the flow of filtering. Let us start from the Date table and revert to the original model shown in Figure 3-5. When you filter a given year in Date, the filter is automatically propagated to both Sales and Purchases. However, because of the direction of the relationship, it does not reach Product. What you want to achieve is to calculate the products that are present in Sales and use this list of products as a further filter to Purchases. The correct formula for the measure is as follows:
Click here to view code image
PurchaseOfSoldProducts := CALCULATE (
[PurchaseAmount],
CROSSFILTER ( Sales[ProductKey], Product[Product
)
In this code, you use the CROSSFILTER function to activate the bidirectional filter between Products and Sales for only the duration of the calculation. In this way, by using standard filtering processes, Sales will filter Product, which then filters Purchases. (For more information on the CROSSFILTER function, see Appendix A, “Data modeling 101.”)
To solve this scenario, we only leveraged DAX code. We did not change the data model. Why is this relevant to data modeling? Because in this case, changing the data model was not the right option, and we wanted to highlight this. Updating the data model is generally the right way to go, but sometimes, such as in this example, you must author DAX code to solve a specific scenario. It helps to acquire the skills needed to understand when to use what. Besides, the data model in this case already consists of two star schemas, so it is very hard to build a better one.
Understanding model ambiguity
The previous section showed that setting a bidirectional filter on a relationship will not work because the model becomes ambiguous. In this section, we want to dive more into the concept of ambiguous models to better understand them and— more importantly—why they are forbidden in Tabular.
An ambiguous model is a model where there are multiple paths joining any two tables through relationships. The simplest form of ambiguity appears when you try to build multiple relationships between two tables. If you try to build a model where the same two tables are linked through multiple relationships, only one of them (by default, the first one you create) will be kept active. The other ones will be marked as inactive. Figure 3-8 shows an example of such a model. Of the three relationships shown, only one is solid (active), whereas the remaining ones are dotted (inactive).
FIGURE 3-8 You cannot keep multiple active relationships between two tables.
Why is this limitation present? The reason is straightforward: The DAX language offers multiple functionalities that work on relationships. For example, in Sales, you can reference any column of the Date table by using the RELATED function, as in the following code:
Click here to view code image
Sales[Year] = RELATED ( 'Date'[Calendar Year] )
RELATED works without you having to specify which relationship to follow. The DAX language automatically follows the only active relationship and then returns the expected year. In this case, it would be the year of the sale, because the active relationship is the one based on OrderDateKey. If you could define multiple active relationships, then you would have to specify which one of the many active relationships to use for each implementation of RELATED. A similar behavior happens with the automatic filter context propagation whenever you define a filter context by using, for example, CALCULATE.
The following example computes the sales in 2009:
Click here to view code image
Sales2009 := CALCULATE ( [Sales Amount],
'Date'[Calendar Year] = "CY 2009" )
Again, you do not specify the relationship to follow. It is implicit in the model that the active relationship is the one using OrderDateKey. (In the next chapter, you will learn how to handle multiple relationships with the Date table in an efficient way. The goal of this section is simply to help you understand why an ambiguous model is forbidden in Tabular.)
You can activate a given relationship for a specific calculation. For example, if you are interested in the sales delivered in 2009, you can compute this value by taking advantage of the USERELATIONSHIP function, as in the following code:
Click here to view code image
Shipped2009 := CALCULATE (
[Sales Amount],
'Date'[Calendar Year] = "CY 2009", USERELATIONSHIP ( 'Date'[DateKey], Sales[Deliver
)
As a general rule, keeping inactive relationships in your model is useful only when you make very limited use of them or if you need the relationship for some special calculation. A user has no way to activate a specific relationship while navigating the model with the user interface. It is the task of the data modeler, not the user, to worry about technical details like the keys used in a relationship. In advanced models, where billions of rows are present in the fact table or the calculations are very complex, the data modeler might decide to keep inactive relationships in the model to speed up certain calculations. However, such optimization techniques will not be necessary at the introductory level at which we are covering data modeling, and inactive relationships will be nearly useless.
Now, let us go back to ambiguous models. As we said, a model might be ambiguous for multiple reasons, even if all those reasons are connected to the presence of multiple paths between tables. Another example of an ambiguous model is the one depicted in Figure 3-9.
FIGURE 3-9 This model is ambiguous, too, although the reason is less evident.
In this model, there are two different age columns. One is Historical Age, which is stored in the fact table. The other is CurrentAge, which is stored in the Customer dimension. Both of these columns are used as foreign keys in the Age Ranges table, but only one of the relationships is permitted to remain active. The other relationship is deactivated. In this case, ambiguity is a bit less evident, but it is there. Imagine you built a PivotTable and sliced it by age range. Would you expect to slice it by the historical age (how old each customer was at the moment of sale) or the current age (how old each customer is today)? If both relationships were kept active, this would be ambiguous. Again, the engine refuses to let you build such a model. It forces you to solve ambiguity by either choosing which relationship to maintain as active or duplicating the table. That way, when you filter either a Current Age Ranges or a Historical Age Ranges table, you specify a unique path to filter data. The resulting model, once the Age Ranges table has been duplicated, is shown in Figure 3-10.