- •Contents at a glance
- •Contents
- •Introduction
- •Who this book is for
- •Assumptions about you
- •Organization of this book
- •Conventions
- •About the companion content
- •Acknowledgments
- •Errata and book support
- •We want to hear from you
- •Stay in touch
- •Chapter 1. Introduction to data modeling
- •Working with a single table
- •Introducing the data model
- •Introducing star schemas
- •Understanding the importance of naming objects
- •Conclusions
- •Chapter 2. Using header/detail tables
- •Introducing header/detail
- •Aggregating values from the header
- •Flattening header/detail
- •Conclusions
- •Chapter 3. Using multiple fact tables
- •Using denormalized fact tables
- •Filtering across dimensions
- •Understanding model ambiguity
- •Using orders and invoices
- •Calculating the total invoiced for the customer
- •Calculating the number of invoices that include the given order of the given customer
- •Calculating the amount of the order, if invoiced
- •Conclusions
- •Chapter 4. Working with date and time
- •Creating a date dimension
- •Understanding automatic time dimensions
- •Automatic time grouping in Excel
- •Automatic time grouping in Power BI Desktop
- •Using multiple date dimensions
- •Handling date and time
- •Time-intelligence calculations
- •Handling fiscal calendars
- •Computing with working days
- •Working days in a single country or region
- •Working with multiple countries or regions
- •Handling special periods of the year
- •Using non-overlapping periods
- •Periods relative to today
- •Using overlapping periods
- •Working with weekly calendars
- •Conclusions
- •Chapter 5. Tracking historical attributes
- •Introducing slowly changing dimensions
- •Using slowly changing dimensions
- •Loading slowly changing dimensions
- •Fixing granularity in the dimension
- •Fixing granularity in the fact table
- •Rapidly changing dimensions
- •Choosing the right modeling technique
- •Conclusions
- •Chapter 6. Using snapshots
- •Using data that you cannot aggregate over time
- •Aggregating snapshots
- •Understanding derived snapshots
- •Understanding the transition matrix
- •Conclusions
- •Chapter 7. Analyzing date and time intervals
- •Introduction to temporal data
- •Aggregating with simple intervals
- •Intervals crossing dates
- •Modeling working shifts and time shifting
- •Analyzing active events
- •Mixing different durations
- •Conclusions
- •Chapter 8. Many-to-many relationships
- •Introducing many-to-many relationships
- •Understanding the bidirectional pattern
- •Understanding non-additivity
- •Cascading many-to-many
- •Temporal many-to-many
- •Reallocating factors and percentages
- •Materializing many-to-many
- •Using the fact tables as a bridge
- •Performance considerations
- •Conclusions
- •Chapter 9. Working with different granularity
- •Introduction to granularity
- •Relationships at different granularity
- •Analyzing budget data
- •Using DAX code to move filters
- •Filtering through relationships
- •Hiding values at the wrong granularity
- •Allocating values at a higher granularity
- •Conclusions
- •Chapter 10. Segmentation data models
- •Computing multiple-column relationships
- •Computing static segmentation
- •Using dynamic segmentation
- •Understanding the power of calculated columns: ABC analysis
- •Conclusions
- •Chapter 11. Working with multiple currencies
- •Understanding different scenarios
- •Multiple source currencies, single reporting currency
- •Single source currency, multiple reporting currencies
- •Multiple source currencies, multiple reporting currencies
- •Conclusions
- •Appendix A. Data modeling 101
- •Tables
- •Data types
- •Relationships
- •Filtering and cross-filtering
- •Different types of models
- •Star schema
- •Snowflake schema
- •Models with bridge tables
- •Measures and additivity
- •Additive measures
- •Non-additive measures
- •Semi-additive measures
- •Index
- •Code Snippets
handled in the model, the measures become simple SUM operations, with no CALCULATE operations or filtering happening inside. From a maintainability point of view, this is extremely important because it means that any new formula will not need to repeat the filtering pattern that was mandatory in the previous models.
Hiding values at the wrong granularity
In previous sections, we attempted to address the granularity issue by moving Sales to the lower granularity of Budget, losing expressivity. Then we managed to merge the two fact tables in a single data model by using intermediate, hidden snowflaked dimensions, which let the user seamlessly browse Budget and Sales. Nevertheless, even if the user can browse budget values slicing by the product brand, he or she will not be able to slice by, say, product color. In fact, color has a different distribution than brand, and Budget does not contain information at the color granularity. Let us examine this with an example. If you build a simple report that slices Sales and Budget by color, you obtain a result similar to what is shown in Figure 9-10.
FIGURE 9-10 Sales Amount is additive, whereas Budget 2009 is not. The sum of the rows is much higher than the grand total.
You might notice a pattern like that in the many-to-many relationship. In fact, this is exactly what is happening. The report is not showing the budget for products of a given color because the Budget fact table does not contain any information about the products. It only knows about brands. In fact, the number shown is the value of Budget for any brand that has at least one product of the given color. This number has at least two problems. First, it is wrong. Second, it is difficult to spot that it is wrong.
You don’t want such a report to come out of your models. The best-case scenario is that users might complain about numbers. The worst-case scenario is that they might make decisions based on wrong figures. As a data modeler, it is your responsibility to make sure that if a number cannot be computed out of the model, you clearly show the error and do not provide any answer. In other words, your code needs to contain some logic to make sure that if a number is returned by
your measures, that number is the right one. Returning a wrong result is, to state the obvious, not an option.
As you might imagine, the next question is this: How do you know that you should not return any value? This is easy, even if it requires some DAX knowledge. You must determine whether the PivotTable (or the report in general) is browsing data beyond the granularity at which the number still makes sense. If it is above the granularity, then you are aggregating values, which is fine. If it is below the granularity, then you are splitting values based on the granularity, even if you are showing them at a more detailed level. In such a case, you should return a BLANK result to inform the user that you do not know the answer.
The key to solving this scenario is being able to count the number of products (or stores) selected at the Sales granularity and compare them with the number of products selected at the Budget granularity. If the two numbers are equal, then the filter induced by the products will produce meaningful values in both fact tables. If, on the other hand, the numbers are different, then the filter will produce an incorrect result on the table with the lower granularity. To achieve this, you define the following two measures:
Click here to view code image
ProductsAtSalesGranularity := COUNTROWS ( Product )
ProductsAtBudgetGranularity := CALCULATE (
COUNTROWS ( Product ), ALL ( Product ),
VALUES ( Product[Brand] )
)
ProductsAtSalesGranularity counts the number of products at the maximum granularity—that is, the product key. Sales is linked to Product at this granularity. ProductsAtBudgetGranularity, on the other hand, counts the number of products, taking into account only the filter on Brand and removing any other existing filters. This is the very definition of the granularity of Budget. You can appreciate the difference between the two measures if you build a report like the one shown in Figure 9-11, which slices the two measures by brand and color.
FIGURE 9-11 This report shows the number of products at different granularities.
The two measures report the same value only when there is a filter on the brand and no other filter is applied. In other words, the two numbers are equal only when the Product table is sliced at the Budget granularity. The same needs to be done for Store, too, where the granularity is country/region. You define two measures to check the granularity at the store level by using the following code:
Click here to view code image
StoresAtSalesGranularity := COUNTROWS ( Store )
StoresAtBudgetGranularity := CALCULATE (
COUNTROWS ( Store ), ALL ( Store ),
VALUES ( Store[CountryRegion] )
)
When you use them in a report, the two measures return the same number at the budget granularity and above, as shown in Figure 9-12.
FIGURE 9-12 This report shows the number of stores at different granularities.
In fact, the numbers are identical not only at the country/region level, but also at the continent level. This is correct because Continent has higher granularity than CountryRegion, and the value of the Budget, at the Continent level, is correct.
The last step, to make sure you show only meaningful numbers for Budget, is to blank out the Budget measure when the measures we have written so far do not match. This can be easily accomplished by using a conditional formula as in the following code:
Click here to view code image
Budget 2009 := IF (
AND (
[ProductsAtBudgetGranularity] = [ProductsAtS [StoresAtBudgetGranularity] = [StoresAtSales
),
SUM ( Budget[Budget] )
)
The additional condition ensures that a value is returned if and only if the report