Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Analyzing Data with Power BI and Power Pivot for Excel (Alberto Ferrari, Marco Russo) (z-lib.org).pdf
Скачиваний:
11
Добавлен:
14.08.2022
Размер:
18.87 Mб
Скачать

Chapter 8. Many-to-many relationships

Many-to-many relationships are an important tool in the data analyst’s toolbelt. Often, they are viewed as problematic because many-to-many relationships tend to make the model more complex than usual. However, we suggest you start thinking about many-to-many relationships as an opportunity instead. In fact, it’s easy to handle many-to-many relationships. You only need to learn the basic technique and then use it at your convenience.

As you will learn in this chapter, many-to-many relationships are extremely powerful and let you build great data models—even if they hide some complexity both in the modeling and in the interpretation of the results. Moreover, many-to- many relationships are present in nearly every data model. For example, simple star schemas contain many-to-many relationships. We want to show you how to recognize these relationships and—most importantly—how to take advantage of them to derive good insights from your data.

Introducing many-to-many relationships

Let us start by introducing many-to-many relationships. There are scenarios where the relationship between two entities cannot be expressed with a simple relationship. The canonical example is that of a current account. The bank stores transactions related to the current account. However, the current account can be owned by multiple individuals at the same time. Conversely, one individual might possess multiple current accounts. Thus, you cannot store the customer key in the Accounts table, and at the same time, you cannot store the account key in the Customers table. This kind of relationship is, by its very nature, that of many entities related to many other entities, and it cannot be expressed with a single column.

Note

There are a lot of other scenarios in which many-to-many relationships appear, such as sales agents with orders, where an order can be overseen by multiple sales agents. Another example could be house ownership, where an individual might possess multiple houses, and the same house might be owned by multiple individuals.

The canonical way of modeling a many-to-many relationship is to use a bridge table that contains the information about which account is owned by which individual. Figure 8-1 shows you an example

of a model with many-to-many relationships based on the current account scenario.

FIGURE 8-1 The relationship between Customers and Accounts is through a bridge table, AccountsCustomers.

The first important thing to note about many-to-many relationships is that they are a different kind of relationship from the point of view of the data model, but when implemented, are transformed into a pair of standard one-to-many relationships. Thus, many-to-many is more of a concept than a physical relationship. We speak of, think about, and work with many-to-many as a kind of relationship, but we implement it as a pair of relationships.

Note that the two relationships that link Customers and Accounts with the bridge table are in opposite directions. In fact, both relationships start from the bridge table and reach the two dimensions. The bridge table is always on the many side.

Why is many-to-many more complex than other kinds of relationships? Here are several reasons:

Many-to-many does not work by default in a data model To be precise, it might or might not work, depending on the version of Tabular you are using and its settings. In Power BI, you can enable bidirectional filtering, whereas

in Microsoft Excel (up to and including Excel 2016), you will need to author some DAX code to make formulas traverse many-to-many relationships the right way.

Many-to-many typically generates non-additive calculations This makes the numbers returned when using many-to-many slightly more difficult to understand, and makes debugging your code a little trickier.

Performance might be an issue Depending on the size of the many-to-many filtering, the traversal of two relationships in opposite directions might become expensive. Thus, when working with many-to-many, you might face performance issues that require more attention.

Let us analyze all these points in more detail.

Understanding the bidirectional pattern

By default, a filter on a table moves from the one side to the many side, but it does not move from the many side to the one side. Thus, if you build a report and slice by customer, the filter will reach the bridge table, but at that point, the filter propagation stops. Consequently, the Accounts table will not receive the filter coming from Customers, as shown in Figure 8-2.

FIGURE 8-2 The filter can move from the one side to the many side, but not from the many side to the one side.

If you build a report that contains the customers on the rows and a simple SUM of the Amount column as the value, the same number is repeated for all the rows. This is because the filter coming from Customers is not working against Accounts, and from there to the transactions. The result is shown in Figure 8-3.

FIGURE 8-3 You cannot filter the amount per customer because of the many-to- many relationship.

You can solve this scenario by setting the propagation of the filter on the relationship between the bridge table and the Accounts table as bidirectional. This is a simple setting in Power BI, where bidirectional filtering is available as part of the modeling tools. In Excel, however, you must use DAX.

One approach is to activate bidirectional filtering in the model, in which case it will be active for all the calculations. Alternatively, you can use the CROSSFILTER function as a parameter of CALCULATE to activate bidirectional filtering for the duration of CALCULATE. You can see this in the following code:

Click here to view code image

SumOfAmount := CALCULATE (

SUM ( Transactions[Amount] ),

CROSSFILTER ( AccountsCustomers[AccountKey], Acc

)

The result is the same in both cases. The filter permits propagation from the many side to the one side of the relationship that links the accounts with the bridge. Thus, the Accounts table will show only the rows that belong to the selected customer.

In Figure 8-4, you can see the report with this new measure side-by-side with the previous one that uses a simple SUM.

FIGURE 8-4 SumOfAmount computes the correct value, whereas Amount always shows the grand total.

There is a difference between setting the relationship as bidirectional and using the DAX code. In fact, if you set the relationship as bidirectional, then any measure will benefit from the automatic propagation of the filter from the many side to the one side. However, if you rely on DAX code, then you need to author all the measures using the same pattern to force the propagation. If you have lots of measures, then it is somewhat annoying to have to use the three lines of the bidirectional pattern for all of them. On the other hand, setting the bidirectional filtering on a relationship might make the model ambiguous. For this reason, you cannot always set the relationship as bidirectional, and you will need to write some code.

With that said, Excel does not offer you bidirectional relationships, thus you have no choice. In Power BI, on the other hand, you can choose the technique you prefer. In our experience, bidirectional relationships are more convenient and tend to lower the number of errors in your code.

You can obtain a similar effect to that of CROSSFILTER by leveraging table expansion in DAX. Explaining table expansion in detail would require a full chapter by itself; we discuss it in more detail in our book The Definitive Guide to DAX, where this topic is covered in detail. Here, we only want to note that, by using table expansion, you can write the previous measure in the following way:

Click here to view code image

SumOfAmount := CALCULATE (

SUM ( Transactions[Amount] ), AccountsCustomers

)

The result is nearly the same as before. You still obtain the filter propagation,