Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Analyzing Data with Power BI and Power Pivot for Excel (Alberto Ferrari, Marco Russo) (z-lib.org).pdf
Скачиваний:
11
Добавлен:
14.08.2022
Размер:
18.87 Mб
Скачать

country or region.

Historical Country Managers This contains the historical sales manager for each country or region.

The next step will be to use these two tables to update both the Customer table and the Sales table.

Fixing granularity in the dimension

You are going to use these two tables to set the right granularity on the Customer and Sales tables. Let us focus on the Customer table first. To increase the granularity, you will need to merge the original Customer table with the Historical Country Managers table. The Customer table contains the CountryRegion column. If you join the Customer table with the Historical Country Managers table based on the CountryRegion, the result will contain more rows, one for each different manager for a given customer. It will not contain a new version of the customer for each year, which would be the worst-case granularity. Instead, because of the grouping operation that is performed on the Historical Country Managers table, it will contain the right number of versions for each customer.

After you have done these two operations, the data set looks like Figure 5-17, with the OriginalCustomerKey column sorted in ascending order.

FIGURE 5-17 The new Customer table shows the adjusted granularity and the actual and historical managers denormalized.

Focus your attention on the first three lines. They represent Jon Yang, a customer in Australia who has had three different sales managers over time: Paul, Mark, and Louise. This is correctly represented in the model, but there is a problem. The column that contains the customer key (called OriginalCustomerKey) is no longer a good key for the table. In fact, that code represents the customer, whereas now we are moving toward a representation of

the versions of the customer. Because the customer key is no longer unique, it cannot be used as our key. Thus, we need a new one.

Generally, you can build a new key by simply adding a new column with an index, which is a number starting from 1 and growing by 1 for each row. This is the preferred technique used by database managers. In our case, the granularity of the new table is at the customer/year level, where the year you use is the last year when the manager was on duty for the country. Therefore, you can safely build a new column by simply concatenating OriginalCustomerKey with Year, which is the year of the Historical Country Managers table denormalized in the new Customer table. Figure 5-18 shows the resulting table with the new key.

FIGURE 5-18 OriginalCustomerKey is not a key. It’s better to use the new CustomerKey, which contains the year.

Having reached this point, you have moved the granularity in the Customer table from the original one (that is, the customer) to the worst-case (that is, the customer and year). This table is not the final one, but it is a useful intermediate step. We saved it under the name CustomerBase.

The final step is to fix the granularity and move to the right one. This step is similar to what you did with the sales managers for the various countries or regions. Starting from CustomerBase, you remove all the columns except the granularity columns, and perform a grouping by OriginalCustomerKey, Actual Manager, and Historical Manager. You then take the MAX of CustomerKey and name it NewCustomerKey. The result is shown in Figure 5-19.

FIGURE 5-19 This temporary table is now at the right level of granularity.

This grouping operation was useful to build the correct granularity, but in performing it, you had to remove all the columns from the original Customer table. The next step is to restore the needed columns. First, you remove all the columns from the table, maintaining only the NewCustomerKey column, as shown in Figure 5-20.

FIGURE 5-20 This table contains only the customer keys, but now it is at the correct granularity.

The final step is to merge this table with CustomerBase, based on the customer key, and to retrieve all the needed columns. The result is shown in Figure 5-21, where you can easily spot the fact that customers vary in the number of versions, depending on how many managers they had over time.

FIGURE 5-21 The final Customer table has the correct granularity and all the relevant columns.

Next, you will perform a similar operation with the Sales table. (Note that because you have changed the key of the Customer table, the CustomerKey column in Sales is no longer a good key to use.)

Fixing granularity in the fact table

In Sales, you cannot compute the new key based on the year of the sale. In fact, if the sales manager of a country or region did not change, then the new key does not depend on the year of the sale. Instead, you can search for the new key. The new cardinality of the dimension depends on the customer, the actual manager, and the historical manager. Given these three values, you can search in the new Customer dimension. There, you will find the new CustomerKey.

You need to perform the following steps on the Sales table to fix its granularity:

1.In the original Sales table, add a column that contains the year of the sales.

2.Perform a join with the CustomerBase table to obtain the customer country as well as the actual and historical managers. (You use CustomerBase because you can search for the sales year there. In the CustomerBase table, you still have one different customer per year. The table had the wrong granularity, but it now proves useful because you can easily search it using the year of the sale.)

The result of the merge operation is in the new column, as shown in Figure 5- 22.

FIGURE 5-22 You need to merge Sales with CustomerBase to retrieve the actual and historical managers.

Having reached this point, you can expand the actual and historical managers. You can then use them to perform a second join with the Customer table, which has the right granularity. Then search for the customer that has the same customer code, actual manager, and historical manager. This final lookup will let you retrieve the new customer key and will solve the granularity on the fact table.

Figure 5-23 shows an extract of the Sales table after the processing. The first highlighted row is about a customer whose manager was first Mark and then changed to Louise. Thus, that customer will have different versions, and the individual row (related to 2007, when the manager was still Mark) points to the 2007 version of the customer. In the second row, the sales manager never changed, so there is only one row for the customer (marked 2009, the last year). In addition, the sale—even if it happened in 2007—points to the 2009 version of the customer. In the final version of the Sales table, the lookup column will no longer be present; it is only part of the processing.

FIGURE 5-23 The two highlighted rows show the different handling of customers who changed sales managers versus customers who did not.

Loading SCDs requires a lot of care. The following is a brief recap of the steps you’ve performed so far: