- •Contents at a glance
- •Contents
- •Introduction
- •Who this book is for
- •Assumptions about you
- •Organization of this book
- •Conventions
- •About the companion content
- •Acknowledgments
- •Errata and book support
- •We want to hear from you
- •Stay in touch
- •Chapter 1. Introduction to data modeling
- •Working with a single table
- •Introducing the data model
- •Introducing star schemas
- •Understanding the importance of naming objects
- •Conclusions
- •Chapter 2. Using header/detail tables
- •Introducing header/detail
- •Aggregating values from the header
- •Flattening header/detail
- •Conclusions
- •Chapter 3. Using multiple fact tables
- •Using denormalized fact tables
- •Filtering across dimensions
- •Understanding model ambiguity
- •Using orders and invoices
- •Calculating the total invoiced for the customer
- •Calculating the number of invoices that include the given order of the given customer
- •Calculating the amount of the order, if invoiced
- •Conclusions
- •Chapter 4. Working with date and time
- •Creating a date dimension
- •Understanding automatic time dimensions
- •Automatic time grouping in Excel
- •Automatic time grouping in Power BI Desktop
- •Using multiple date dimensions
- •Handling date and time
- •Time-intelligence calculations
- •Handling fiscal calendars
- •Computing with working days
- •Working days in a single country or region
- •Working with multiple countries or regions
- •Handling special periods of the year
- •Using non-overlapping periods
- •Periods relative to today
- •Using overlapping periods
- •Working with weekly calendars
- •Conclusions
- •Chapter 5. Tracking historical attributes
- •Introducing slowly changing dimensions
- •Using slowly changing dimensions
- •Loading slowly changing dimensions
- •Fixing granularity in the dimension
- •Fixing granularity in the fact table
- •Rapidly changing dimensions
- •Choosing the right modeling technique
- •Conclusions
- •Chapter 6. Using snapshots
- •Using data that you cannot aggregate over time
- •Aggregating snapshots
- •Understanding derived snapshots
- •Understanding the transition matrix
- •Conclusions
- •Chapter 7. Analyzing date and time intervals
- •Introduction to temporal data
- •Aggregating with simple intervals
- •Intervals crossing dates
- •Modeling working shifts and time shifting
- •Analyzing active events
- •Mixing different durations
- •Conclusions
- •Chapter 8. Many-to-many relationships
- •Introducing many-to-many relationships
- •Understanding the bidirectional pattern
- •Understanding non-additivity
- •Cascading many-to-many
- •Temporal many-to-many
- •Reallocating factors and percentages
- •Materializing many-to-many
- •Using the fact tables as a bridge
- •Performance considerations
- •Conclusions
- •Chapter 9. Working with different granularity
- •Introduction to granularity
- •Relationships at different granularity
- •Analyzing budget data
- •Using DAX code to move filters
- •Filtering through relationships
- •Hiding values at the wrong granularity
- •Allocating values at a higher granularity
- •Conclusions
- •Chapter 10. Segmentation data models
- •Computing multiple-column relationships
- •Computing static segmentation
- •Using dynamic segmentation
- •Understanding the power of calculated columns: ABC analysis
- •Conclusions
- •Chapter 11. Working with multiple currencies
- •Understanding different scenarios
- •Multiple source currencies, single reporting currency
- •Single source currency, multiple reporting currencies
- •Multiple source currencies, multiple reporting currencies
- •Conclusions
- •Appendix A. Data modeling 101
- •Tables
- •Data types
- •Relationships
- •Filtering and cross-filtering
- •Different types of models
- •Star schema
- •Snowflake schema
- •Models with bridge tables
- •Measures and additivity
- •Additive measures
- •Non-additive measures
- •Semi-additive measures
- •Index
- •Code Snippets
country or region.
Historical Country Managers This contains the historical sales manager for each country or region.
The next step will be to use these two tables to update both the Customer table and the Sales table.
Fixing granularity in the dimension
You are going to use these two tables to set the right granularity on the Customer and Sales tables. Let us focus on the Customer table first. To increase the granularity, you will need to merge the original Customer table with the Historical Country Managers table. The Customer table contains the CountryRegion column. If you join the Customer table with the Historical Country Managers table based on the CountryRegion, the result will contain more rows, one for each different manager for a given customer. It will not contain a new version of the customer for each year, which would be the worst-case granularity. Instead, because of the grouping operation that is performed on the Historical Country Managers table, it will contain the right number of versions for each customer.
After you have done these two operations, the data set looks like Figure 5-17, with the OriginalCustomerKey column sorted in ascending order.
FIGURE 5-17 The new Customer table shows the adjusted granularity and the actual and historical managers denormalized.
Focus your attention on the first three lines. They represent Jon Yang, a customer in Australia who has had three different sales managers over time: Paul, Mark, and Louise. This is correctly represented in the model, but there is a problem. The column that contains the customer key (called OriginalCustomerKey) is no longer a good key for the table. In fact, that code represents the customer, whereas now we are moving toward a representation of
the versions of the customer. Because the customer key is no longer unique, it cannot be used as our key. Thus, we need a new one.
Generally, you can build a new key by simply adding a new column with an index, which is a number starting from 1 and growing by 1 for each row. This is the preferred technique used by database managers. In our case, the granularity of the new table is at the customer/year level, where the year you use is the last year when the manager was on duty for the country. Therefore, you can safely build a new column by simply concatenating OriginalCustomerKey with Year, which is the year of the Historical Country Managers table denormalized in the new Customer table. Figure 5-18 shows the resulting table with the new key.
FIGURE 5-18 OriginalCustomerKey is not a key. It’s better to use the new CustomerKey, which contains the year.
Having reached this point, you have moved the granularity in the Customer table from the original one (that is, the customer) to the worst-case (that is, the customer and year). This table is not the final one, but it is a useful intermediate step. We saved it under the name CustomerBase.
The final step is to fix the granularity and move to the right one. This step is similar to what you did with the sales managers for the various countries or regions. Starting from CustomerBase, you remove all the columns except the granularity columns, and perform a grouping by OriginalCustomerKey, Actual Manager, and Historical Manager. You then take the MAX of CustomerKey and name it NewCustomerKey. The result is shown in Figure 5-19.
FIGURE 5-19 This temporary table is now at the right level of granularity.
This grouping operation was useful to build the correct granularity, but in performing it, you had to remove all the columns from the original Customer table. The next step is to restore the needed columns. First, you remove all the columns from the table, maintaining only the NewCustomerKey column, as shown in Figure 5-20.
FIGURE 5-20 This table contains only the customer keys, but now it is at the correct granularity.
The final step is to merge this table with CustomerBase, based on the customer key, and to retrieve all the needed columns. The result is shown in Figure 5-21, where you can easily spot the fact that customers vary in the number of versions, depending on how many managers they had over time.
FIGURE 5-21 The final Customer table has the correct granularity and all the relevant columns.
Next, you will perform a similar operation with the Sales table. (Note that because you have changed the key of the Customer table, the CustomerKey column in Sales is no longer a good key to use.)
Fixing granularity in the fact table
In Sales, you cannot compute the new key based on the year of the sale. In fact, if the sales manager of a country or region did not change, then the new key does not depend on the year of the sale. Instead, you can search for the new key. The new cardinality of the dimension depends on the customer, the actual manager, and the historical manager. Given these three values, you can search in the new Customer dimension. There, you will find the new CustomerKey.
You need to perform the following steps on the Sales table to fix its granularity:
1.In the original Sales table, add a column that contains the year of the sales.
2.Perform a join with the CustomerBase table to obtain the customer country as well as the actual and historical managers. (You use CustomerBase because you can search for the sales year there. In the CustomerBase table, you still have one different customer per year. The table had the wrong granularity, but it now proves useful because you can easily search it using the year of the sale.)
The result of the merge operation is in the new column, as shown in Figure 5- 22.
FIGURE 5-22 You need to merge Sales with CustomerBase to retrieve the actual and historical managers.
Having reached this point, you can expand the actual and historical managers. You can then use them to perform a second join with the Customer table, which has the right granularity. Then search for the customer that has the same customer code, actual manager, and historical manager. This final lookup will let you retrieve the new customer key and will solve the granularity on the fact table.
Figure 5-23 shows an extract of the Sales table after the processing. The first highlighted row is about a customer whose manager was first Mark and then changed to Louise. Thus, that customer will have different versions, and the individual row (related to 2007, when the manager was still Mark) points to the 2007 version of the customer. In the second row, the sales manager never changed, so there is only one row for the customer (marked 2009, the last year). In addition, the sale—even if it happened in 2007—points to the 2009 version of the customer. In the final version of the Sales table, the lookup column will no longer be present; it is only part of the processing.
FIGURE 5-23 The two highlighted rows show the different handling of customers who changed sales managers versus customers who did not.
Loading SCDs requires a lot of care. The following is a brief recap of the steps you’ve performed so far: