Slowly changing dimensions are very common in the data warehouses and, basically, they store many versions of the same entity whenever a change happens in the columns for which history needs to be maintained. For example, the AdventureWorks data warehouse has a type 2 SCD in the DimProduct table. It can be easily checked for the product code “FR-M94S-38” which shows three different versions of itself, with changing product cost and list price.
This is exactly what we can expect to find in any data warehouse: each row contains the historical value of the attribute, so that we can perform analysis on how the attribute has changed over time. Moreover, a column (status, in the example) holds information about which is the current version of the product. Thus, looking at the table, it is very easy to detect that the current price of the product is 1,346.50. Nevertheless reporting on the current list price is not very easy because there is no way to use the current value to slice data in a PivotTable: no columns hold it; it has to be computed in some way.
The issue might not be evident with the list price but, if you think at a slowly changing dimension with the customer address as an historical attribute, it might be interesting to know where a customer lived but it is much more interesting to know where he lives now. Moreover, the best would be to be able to get both information, which might be interesting for different purposes. Unfortunately, in the AdventureWorks company they seem to be interested only in price changes, so I had to use this as an example.
Now, if you live in a corporate BI environment and want to perform current vs historical price comparisons, then somebody will create a view for you which, through some complex JOIN, will be able to expose both the historical and the current version of some columns. If you dare to ask him to consolidate the values in the dimension table, he will complain about ETL time, UPDATES needed on a dimension, log file usage and, at the end, he will probably refuse to do that.
Luckily, in the Self-Service BI world, things are much easier. If we want to add a calculated column to the table that computes, for each version of the product, which is the current list price (which happens to be the same for all the instances of the same product), it will be enough to use this formula:
=CALCULATE ( VALUES (DimProduct[ListPrice]), ALL (DimProduct), DimProduct[ProductAlternateKey] = EARLIER (DimProduct[ProductAlternateKey]), DimProduct[Status] = "Current" )
Its behavior is very easy to understand: compute the value of the listPrice searching in all products which have the same ProductAlternateKey as the current product and the Status equal to” “Current”. The only part worth noting is the usage of ALL (DimProduct) inside the CALCULATE filters. This is needed because the initial CALCULATE will consolidate the row context of the calculated column into a filter one, which we will need to remove in order to replace it with our specific filter. The result is straightforward:
Now CurrentListPrice is a calculated column and can be used to slice data in a PivotTable, as any other column:
Some words need to be spent if you ever face a data warehouse which does not hold a “Current” column like AdventureWorks. There is really no standard technique to identify the current version of a product: sometimes we have an “historical/current” column, as in this example, sometimes we need to infer it from the EndDate containing NULL, sometimes we need to find the last version of the record and use that. I am writing here the various flavors of the formula, just to keep them at hand in case they are needed.
If we need to rely on the emptiness of the EndDate column, the formula is easily adapted:
=CALCULATE ( VALUES (DimProduct[ListPrice]), ALL (DimProduct), DimProduct[ProductAlternateKey] = EARLIER (DimProduct[ProductAlternateKey]), ISBLANK (DimProduct[EndDate]) )
While, if we need to search for the last occurrence of the record with the same code, we need some more DAX acrobatics to express the formula:
=CALCULATE (
VALUES (DimProduct[ListPrice]),
FILTER (
ALL (DimProduct),
DimProduct[ProductAlternateKey] = EARLIER (DimProduct[ProductAlternateKey]) &&
DimProduct[StartDate] = CALCULATE (
MAX (DimProduct[StartDate]),
ALL (DimProduct),
DimProduct[ProductAlternateKey] = EARLIER (DimProduct[ProductAlternateKey])
)
)
)
This last formula is the most interesting one, since it relies on a minimum of information (just the start date) and performs the whole logic of current value detection, searching, for each row, the one containing the same code but the last date of validity. Needless to say the three formulas return the very same value.
As a side note, I think that in BISM this will be a very easy and convenient method to handle slowly changing dimensions even for the corporate BI environment since the current and historical values of any attribute can be easily computed with a minimal CPU effort and without the need to perform any update on the historical dimension on the database. I wish I had a similar functionality long time ago, when I needed to write ETL code to handle this scenario and spent some time in optimizations and locking issues…
This is just another example of what PowerPivot can do for your reporting needs. If you want to touch the real Karma of PowerPivot, don’t miss the opportunity to follow one of the workshops I and Marco Russo are bringing all over Europe in the next few months.
You can find more info on www.powerpivotworkshop.com. Hope to see you there!
Evaluates an expression in a context modified by filters.
CALCULATE ( <Expression> [, <Filter> [, <Filter> [, … ] ] ] )
When a column name is given, returns a single-column table of unique values. When a table name is given, returns a table with the same columns and all the rows of the table (including duplicates) with the additional blank row caused by an invalid relationship if present.
VALUES ( <TableNameOrColumnName> )
Returns all the rows in a table, or all the values in a column, ignoring any filters that might have been applied.
ALL ( [<TableNameOrColumnName>] [, <ColumnName> [, <ColumnName> [, … ] ] ] )
Returns the value in the column prior to the specified number of table scans (default is 1).
EARLIER ( <ColumnName> [, <Number>] )
Checks whether a value is blank, and returns TRUE or FALSE.
ISBLANK ( <Value> )
Returns a table that has been filtered.
FILTER ( <Table>, <FilterExpression> )
Returns the largest value in a column, or the larger value between two scalar expressions. Ignores logical values. Strings are compared according to alphabetical order.
MAX ( <ColumnNameOrScalar1> [, <Scalar2>] )