Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Analyzing Data with Power BI and Power Pivot for Excel (Alberto Ferrari, Marco Russo) (z-lib.org).pdf
Скачиваний:
11
Добавлен:
14.08.2022
Размер:
18.87 Mб
Скачать

Note

As you learn more about data modeling, you might encounter a situation in which you think it is best to deviate from star schemas. Don’t. There are several reasons why star schemas are nearly always your best option. Unfortunately, most of these reasons can be appreciated only after you have had some experience in data modeling. If you don’t have a lot of experience, just trust the tens of thousands of BI professionals all around the planet who know that star schemas are nearly always the best option—no matter what.

Understanding the importance of naming objects

When you build your data model, you typically load data from a SQL Server database or some other data source. Most likely, the developer of the data source decided the naming convention. There are tons of these naming conventions, up to the point that it is not wrong to say that everybody has his or her own personal naming convention.

When building data warehouses, some database designers prefer to use Dim as a prefix for dimensions and Fact for fact tables. Thus, it is very common to see table names such as DimCustomer and FactSales. Others like to differentiate between views and physical tables, using prefixes like Tbl for tables and Vw for views. Still others think names are ambiguous and prefer to use numbers instead, like Tbl_190_Sales. We could go on and on, but you get the point: There are many standards, and each one has pros and cons.

Note

We could argue whether these standards make any sense in the database, but this would go outside of the scope of this book. We will stick to discussing how to handle standards in the data model that you browse with Power BI or Excel.

You do not need to follow any technical standard; only follow common sense and ease of use. For example, it would be frustrating to browse a data model where tables have silly names, like VwDimCstmr or Tbl_190_FactShpmt. These names are strange and non-intuitive. Still, we encounter these types of names in data models quite a lot. And we are talking about table names only. When it comes

to column names, the lack of creativity gets even more extreme. Our only word of advice here is to get rid of all these names and use readable names that clearly identify the dimension or the fact table.

We have built many analytical systems over the years. Over time, we have developed a very simple set of rules for table and column naming:

Table names for dimensions should consist of only the business asset name, in singular or plural form Thus, customers are stored in a table called Customer or Customers. Products are stored in a table called Product or Products. (In our opinion, singular is preferable because it works slightly better with natural language queries in Power BI.)

If the business asset contains multiple words, use casing to separate the words Thus, product categories are stored in ProductCategory, and the country of shipment might be CountryShip or CountryShipment. Alternatively, you can use spaces instead of casing—for example, using table names like Product Category. This is fine, but it does make the writing of DAX code a bit harder. It is more a matter of personal choice than anything else.

Table names for facts should consist of the business name for the fact, which is always plural Thus, sales are stored in a table named Sales, and purchases, as you might imagine, are stored in a table named Purchases. By using plural instead of singular, when you look at your model, you will naturally think of one customer (the Customer table) with many sales (the Sales table), stating and enforcing the nature of the one-to-many relationship whenever you look at the tables.

Avoid names that are too long Names like CountryOfShipmentOfGoodsWhenSoldByReseller are confusing. Nobody wants to read such a long name. Instead, find good abbreviations by eliminating the useless words.

Avoid names that are too short We know you are used to speaking with acronyms. But while acronyms might be useful when speaking, they are unclear when used in reports. For example, you might use the acronym CSR for country of shipment for resellers, but that will be hard to remember for anybody who does not work with you all day long. Remember: Reports are meant to be shared with a vast number of users, many of whom do not understand your acronyms.

The key to a dimension is the dimension name followed by Key Thus, the primary key of Customer is CustomerKey. The same goes for foreign keys.

You will know something is a foreign key because it is stored in a table with a different name. Thus, CustomerKey in Sales is a foreign key that points to the Customer table, whereas in Customer it is the primary key.

This set of rules is very short. Anything else is up to you. You can decide the names of all the remaining columns by following the same common sense. A wellnamed data model is easy to share with anybody. In addition, you are much more likely to find errors or issues in the data model if you follow these standard naming techniques.

Tip

When you are in doubt about a name, ask yourself, “Will somebody else be able to understand this name?” Don’t think you are the only user of your reports. Sooner or later, you will want to share a report with somebody else, who might have a completely different background than yours. If that person will be able to understand your names, then you are on the right track. If not, then it is time to re-think the names in your model.

Conclusions

In this chapter, you learned the basics of data modeling, namely:

A single table is already a data model, although in its simplest form.

With a single table, you must define the granularity of your data. Choosing the right granularity makes calculations much easier to author.

The difference between working with a single table and multiple ones is that when you have multiple tables, they are joined by relationships.

In a relationship, there is a one side and a many side, indicating how many rows you are likely to find if you follow the relationship. Because one product has many sales, the Product table will be the one side, and the Sales table will be the many side.

If a table is the target of a relationship, it needs to have a primary key, which is a column with unique values that can be used to identify a single row. If a key is not available, then the relationship cannot be defined.

A normalized model is a data model where the data is stored in a compact way, avoiding repetitions of the same value in different rows. This structure typically increases the number of tables.

A denormalized model has a lot of repetitions (for example, the name Red is repeated multiple times, once for each red product), but has fewer tables.

Normalized models are used for OLTP, whereas denormalized models are used in analytical data models.

A typical analytical model differentiates between informational assets (dimensions) and events (facts). By classifying each entity in the model as either a fact or a dimension, the model is built in the form of a star schema. Star schemas are the most widely used architecture for analytical models, and for a good reason: They work fine nearly always.