Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Analyzing Data with Power BI and Power Pivot for Excel (Alberto Ferrari, Marco Russo) (z-lib.org).pdf
Скачиваний:
11
Добавлен:
14.08.2022
Размер:
18.87 Mб
Скачать

Product is on the many side of the relationship between Product and Product Subcategory. Thus, there are many products with the same category. If you add the Product Subcategory key to the fact table, you are not going to change its size, in terms of rows.

Whenever you build a new model, always take these considerations into account. After you define the dimensions, try to reduce the size of the fact table to its natural granularity, by performing grouping and pre-aggregation during the extraction of the data. The result is a smaller model, or more precisely, a model of optimal size: Neither too small nor too large, simply perfect.

Notice that the fact table discussed in this section includes no detailed information about the order number. If you put the order number in the table, then many rows will become different, even if they share the same set of dimensions. For example, two identical orders for the same customer could be grouped if you do not take into account the order number. However, as soon as you want the order number, they can no longer be grouped together. Thus, the presence of detailed information in the fact table changes the granularity of the table itself. You might have good reasons to store this detailed information in the fact table. It is only important to understand that its presence has a high cost in terms of size and memory use. Store these fields only if they are really needed for your reports.

Relationships at different granularity

Now that we have set the common terminology about granularity, let us see examples where granularity is different between fact tables. A great example is a budgeting scenario.

Analyzing budget data

When you need to analyze a budget, you are likely to check the difference between the actual sales (either in the past or in the current year) and the forecasted, budgeted figures. This leads to interesting key performance indicators (KPIs) and reports. To do this, however, you must face the problem of granularity. In fact, it is very unlikely that you have forecasted sales in the budget for each product and day. However, you have sales at the product and day level. Let us explore this with an example. Figure 9-3 shows a data model with a standard star schema for Sales and a Budget table, which contain the figures for the next year.

FIGURE 9-3 This data model contains sales and budget data in the same structure.

The budget information is present at the country/region and brand level. Obviously, it does not make sense to provide a day-by-day budget. When you forecast numbers, you do so at a higher level. The same applies at the product level. You cannot forecast the sales of individual products (unless you have very few sales). In the example shown in Figure 9-3 the budgeting manager focused on only two attributes: the country/region and the brand.

If you try to build a report that shows Sales and Budget in the same structure, you will find yourself in trouble because of the missing relationships. As you can see in Figure 9-4, you can slice sales in 2009 by brand using the product brand (that is, the Brand column in the Product table), but you cannot slice budget information using the same column, because there are no relationships between Product and Budget.

FIGURE 9-4 The product brand does not slice the Budget table because there are no relationships between Product and Budget.

You might remember that we dealt with a similar scenario in Chapter 1, “Introduction to data modeling.” At that time, you did not have the knowledge you have now, so we can now discuss in more detail the different options available to solve it.

It is important to note that the granularity issue is not a mistake in the model. The budget exists at its own granularity, whereas sales are present at a different one. Both tables are modeled the right way. However, it is not so easy to slice by both of them.

The first option we want to analyze is the easiest way of making the budget model a working one: reducing the granularity of both tables to match by removing details in Sales that are not present in Budget. You can easily do this by modifying the queries that load Sales and removing tables referenced by Sales that are at a level of detail not supported by the budget. The resulting model is shown in Figure 9-5.

FIGURE 9-5 By simplifying both tables, you can obtain a simple star schema.

To obtain the simplified model, we reduced the granularity of Sales by removing all the details. We had to remove the date references, the product key (replaced by the brand), and the StoreKey (replaced by the CountryRegion). We precomputed the amount of sales while grouping. All the dimensions are gone, replaced by two simple dimensions containing the brands and the country/region. The resulting model is straightforward and, as shown in Figure 9-6, it works just fine. You can slice Sales and Budget by the brand, obtaining meaningful figures.