Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Analyzing Data with Power BI and Power Pivot for Excel (Alberto Ferrari, Marco Russo) (z-lib.org).pdf
Скачиваний:
11
Добавлен:
14.08.2022
Размер:
18.87 Mб
Скачать

FIGURE 3-10 In the model, there are now two Age Ranges tables.

Using orders and invoices

The next example is a very practical one that you are likely to encounter in your daily work. Suppose you receive orders from your customers and, once a month, you send out an invoice that includes multiple orders. Each invoice contains some orders, but the relationship between the invoices and the orders is not clearly stated in the model. So, we will need to work a bit to re-create it.

You start with a data model like the one shown in Figure 3-11.

FIGURE 3-11 The data model of orders and invoices is a simple star schema.

This time, the starting data model is a star schema with two fact tables and a

dimension in the middle. In the Customer dimension, we have already defined the following two measures:

Click here to view code image

Amount Ordered := SUM ( Orders[Amount] )

Amount Invoiced:= SUM ( Invoices[Amount] )

With the two measures in place, you can easily build a report that shows the amount ordered and the amount invoiced for each customer. This makes it easy to spot how much you need to invoice each customer for, like in the example shown in Figure 3-12.

FIGURE 3-12 The amount ordered and invoiced per customer is an easy report to build.

If you are interested only in the top-level numbers, like in this pivot table, everything works just fine. Unfortunately, you will face problems as soon as you want to dig a bit more into the details. For example, how do you determine which orders have not yet been invoiced? Before proceeding, spend some time looking at the data model shown in Figure 3-11 and try to spot where the problem is. When you’re finished, continue reading. Because this example hides some complexity, we will need to do some trial and error to identify the issue. Thus, we will show you several wrong solutions to highlight the reason why they are wrong.

If you put the order number in the PivotTable, the result will be hard to read and understand, as shown in Figure 3-13, where all the orders are listed under John, Melanie, and Paul.

FIGURE 3-13 When you drill down to the order level, the Amount Invoiced column returns the wrong results.

This scenario is very similar to the one at the beginning of this chapter, which had two completely denormalized fact tables. The filter on the order number is not effective against the invoices because an invoice does not have an order number. Therefore, the value shown by Amount Invoiced uses the filter only on the customer, showing the total invoiced per customer on all the rows.

At this point, it is worth repeating one important concept: The number reported by the PivotTable is correct. It is the correct number given the information present in the model. If you carefully think about it, there is no way the engine can split the amount invoiced among the different orders because the information about which order was invoiced is missing from the model. Thus, the solution to this scenario requires us to build a proper data model. It needs to contain not only the information about the total invoiced, but also the details about which orders have been invoiced and which invoice contains what orders. As usual, before moving further, it is worth spending some time trying to figure out how you would solve

this case.

There are multiple solutions to this scenario, depending on the complexity of the data model. Before going into more details, let us take a look at the data shown in Figure 3-14.

FIGURE 3-14 The figure shows the actual data used in this model.

As you can see, the Invoices and Orders tables both have a Customer column, which contains customer names. Customer is on the one side of two many-to-one relationships that start from Orders and Invoices. What we need to add to the model is a new relationship between Orders and Invoices that states which order is invoiced with what invoice. There are two possible scenarios:

Each order is related to an individual invoice You face this scenario when an order is always fully invoiced. Thus, an invoice can contain multiple orders, but one order always has a single invoice. You can read, in this description, a one-to-many relationship between the invoices and orders.

Each order can be invoiced in multiple invoices If an order can be partially invoiced, then the same order might belong to multiple invoices. If this is the case, then one order can belong to multiple invoices, and, at the same time, one invoice can contain multiple orders. In such a case, you are facing a many-to-many relationship between orders and invoices, and the scenario is a bit more complex.

The first scenario is very simple to solve. In fact, you only need to add the

invoice number to the Orders table by using one additional column. The resulting model is shown in Figure 3-15.

FIGURE 3-15 The highlighted column contains the invoice number for each given order.

Even if it looks like a simple modification to the model, it is not so easy to handle. In fact, when you load the new model and try to build the relationship, you will experience a bad surprise: The relationship can be created, but it is left inactive, as shown in Figure 3-16.

FIGURE 3-16 The relationship between the Orders and Invoices tables is created as an inactive relationship.

Where is the ambiguity in the model? If the relationship between Orders and Invoices would remain active, then you would have two paths from Orders to Customer: one straight, using the relationship between Orders and Customer, and an indirect one going from Orders to Invoices and then, finally, to Customer. Even if, in this case, the two relationships would end up pointing to the same customer, this is not known to the model and only depends on the data. Nothing in the model prevents you from incorrectly relating an order with an invoice that points to a customer who is different from the one in the invoice. Thus, the model, as it is, does not work.

The way to fix this is much simpler than expected. In fact, if you look carefully at the model, there is a one-to-many relationship between Customer and Invoices, and another one-to-many relationship between Invoices and Orders. The customer of an order can be safely retrieved using Invoices as a middle table. Thus, you can remove the relationship between Customer and Orders and rely on the other two, obtaining the model shown in Figure 3-17.

FIGURE 3-17 When the relationship between Orders and Customer is removed, the model is much simpler.

Does the model in Figure 3-17 look familiar? This is the very same pattern of the header/detail data model that we discussed in Chapter 2, “Using header/detail tables.” You now have two fact tables: one containing the invoices and the other one containing the orders. Orders acts as the detail table, whereas Invoices acts as the header table.

Being a header/detail pattern, this model inherits all the pros and cons of that model. To some extent, the problem of the relationship is solved, but the problem of amounts is not. If you browse the model with a PivotTable, the result is the same as the one shown in Figure 3-13, with all the order numbers listed for all the customers. The reason for this is that, whatever order you choose, the total invoiced per customer is always the same. Even if the chain of relationships is set in the correct way, the data model is still incorrect.

In reality, the situation is a bit subtler than this. When you browse by customer name and order number, what data do you want to report? Review the following data measurements:

The total invoiced for that customer This is the number reported by the system right now, and it looks wrong.

The total number of invoices that include the given order of the given customer In such a case, you want to report the total invoiced if the order was present in the invoice, and make the result blank otherwise.

The amount of the order, if invoiced In this case, you report the full amount of the order if it has been invoiced or a zero otherwise. You might report values higher than the actual invoices because you report the full order, not only the invoiced part.

Note

The list might end here, but we are forgetting an important part. What