UPDATE 2017-07-12: please note this article was written in 2010, there are now better ways to obtain the same result. Please read these articles: Physical and Virtual Relationships in DAX and Propagate filters using TREATAS in DAX
PowerPivot supports only one type of relationship between two tables, which is the one-to-many relationship. You can define that a column in a table (the “many” side) corresponds to a lookup table through a column which is an identity column there (the “one” side). DAX provides support to follow this relationship through functions such as RELATED and RELATEDTABLE.
As I already wrote in this blog, many-to-many relationships are not directly supported by DAX and we can work-around that by writing more or less complex DAX expressions. However, in that case the many-to-many relationship was still partially supported by two existing one-to-many relationships. Sometimes there cases where the relationship would require the creation of new tables that we do not have and it might be useful to implement the same concept of relationship by using only DAX and nothing else. The scenario that I will show will be the cost simulation by using several couriers. But before that, let’s learn how to use DAX to get the same RELATED function behavior without having an underlying relationship in the model.
IMPORTANT NOTE: in a following post we will see how we can optimize performance of the shipment cost simulation sample with a model based on relationships between tables. However, the sample in this post can be easier to implement for an end user. With medium data sets (below million of rows) performance are still good.
Simulate a table relationship in DAX
We have an Orders table and for each Order there is a Country where the order will be shipped. The Weight column is the parameter necessary to calculate the shipment cost (it could be the weight in kg, for example).
OrderNumber
Country
Weight
SO43697
Canada
35.50
SO43701
Australia
52.00
Then we have a PriceList table with shipment cost for each Country.
Country
Cost
Australia
3.50
Canada
0.50
France
2.50
Germany
2.60
Italy
2.70
USA
0.40
If we had the relationship between these two tables, the calculated column for each order would be:
‘Orders'[ShipmentCost] = ‘Orders'[Weight] * RELATED(‘PriceList'[Cost] )
If instead than a calculated column we would like defining a calculated measure, it will be:
‘Orders'[ShipmentCost] = SUMX( ‘Orders’, ‘Orders’ [Weight] * RELATED( ‘PriceList'[Cost] ) )
But what if we don’t have a relationship? Imagine we don’t have a RELATED function. Well, the idea is that we have to filter the row containing the same destination country of the order we are calculating. So, we’d like to replace the expression
RELATED( ‘PriceList'[Cost] )
with something like
CALCULATE(
VALUES( ‘PriceList'[Cost] ),
‘PriceList'[Country] = ‘Orders'[Country] )
We wrote VALUES( ‘PriceList'[Cost] ) because we expect a single value will be returned and VALUES avoid the syntax error that we would have got writing only ‘PriceList'[Cost] in the first parameter of CALCULATE. However, the whole expression is not going to work. This is the error:
The expression contains multiple columns, but only a single column can be used in a Boolean expression that is used as a table filter expression.
I have to say that this error is not very clear. The problem is that ‘PriceList'[Country] = ‘Orders'[Country] does not give an indication of what is the table that we want to filter (remember that a filter expression in CALCULATE always has its effects on a table, see this previous post). What we want to do is to filter rows of the PriceList table, so that our VALUES function will return only the row of the country we have in our Order. We can use a FILTER function instead, returning only the rows from PriceList table which satisfy the condition:
CALCULATE(
VALUES( ‘PriceList'[Cost] ),
FILTER( ‘PriceList’,
‘PriceList'[Country] = ‘Orders'[Country]
)
)
Replacing the original RELATED call with the CALCULATE expression above, we can define the calculated measure ShipmentCost in this way (in a packed form):
‘Orders'[ShipmentCost] = SUMX( ‘Orders’, ‘Orders’ [Weight] *
CALCULATE( VALUES( ‘PriceList'[Cost] ),
FILTER( ‘PriceList’,
‘PriceList'[Country] = ‘Orders'[Country] ) ) )
The resulting PivotTable will show this calculation.
Row Labels
Sum of Weight
ShipmentCost
SO43697
35.5
17.75
Canada
35.5
17.75
SO43701
52
182
Australia
52
182
Grand Total
87.5
199.75
Table relationship through multiple columns in DAX
Now that we know how to simulate the relationship in DAX, we can also use multiple columns for defining a relationship. For example, imagine we have Country and Zone columns in the Orders table.
OrderNumber
Country
Zone
Weight
SO43697
Canada
1
35.50
SO43701
Australia
2
52.00
Our PriceList now contains different prices for each Country and Zone.
Country
Zone
Cost
Australia
1
2.90
Australia
2
3.20
Australia
3
3.50
Canada
1
0.50
Canada
2
0.55
Canada
3
0.60
We can add another condition to the FILTER on PriceList table, using two boolean expression in an AND condition: we want to have the same Country and the same Zone. The following formula makes the thing working:
‘Orders'[ShipmentCost] = SUMX( ‘Orders’, ‘Orders’ [Weight] *
CALCULATE( VALUES( ‘PriceList'[Cost] ),
FILTER( ‘PriceList’,
‘PriceList'[Country] = ‘Orders'[Country]
&&
‘PriceList'[Zone] = ‘Orders'[Zone] ) ) )
The resulting PivotTable will show the calculation of ShipmentCost for each order considering the price based on Country and Zone.
Row Labels
Sum of Weight
ShipmentCost
SO43697
35.5
17.75
Canada
35.5
17.75
1
35.5
17.75
SO43701
52
166.4
Australia
52
166.4
2
52
166.4
Grand Total
87.5
184.15
Shipment cost simulation for different Couriers
At this point we want to simulate the cost using different Couriers. Using the same Orders table we have seen before, we add a Courier column to the PriceList table.
Courier
Country
Zone
Cost
Blu Express
Australia
1
2.90
Blu Express
Australia
2
3.20
Blu Express
Australia
3
3.50
Blu Express
Canada
1
0.50
Blu Express
Canada
2
0.55
Blu Express
Canada
3
0.60
SpeedyMail
Australia
1
3.00
SpeedyMail
Australia
2
3.15
SpeedyMail
Australia
3
3.45
SpeedyMail
Canada
1
0.45
SpeedyMail
Canada
2
0.50
SpeedyMail
Canada
3
0.55
Now have to use the Courier attribute in the resulting PivotTable, otherwise there will be an error in the ShipmentCost calculation because there are more prices for each Country/Zone combination.
Row Labels
Sum of Weight
ShipmentCost
Blu Express
87.5
184.15
SO43697
35.5
17.75
SO43701
52
166.4
SpeedyMail
87.5
179.775
SO43697
35.5
15.975
SO43701
52
163.8
Grand Total
87.5
#VALUE!
Having an error for the Grand Total row is probably right. We want to compare different Couriers, simulating a complete calculation for each order using different price lists. Looking at aggregated data, without the order granularity, we will find the most convenient Courier. Data can also diced by Country if we can make local agreements. However, we might want to have a reference value for the Grand Total row and, in general, a valid value for ShipmentCost even when a Courier is not selected. In these cases, we can replace the VALUES( ‘PriceList'[Cost] ) expression in the CALCULATE statement with AVERAGE( ‘PriceList'[Cost] ), which will use the average price of the selected Couriers (we can always use slicers) in the final total row.
Row Labels
Sum of Weight
ShipmentCost
Blu Express
87.5
184.15
SO43697
35.5
17.75
SO43701
52
166.4
SpeedyMail
87.5
179.775
SO43697
35.5
15.975
SO43701
52
163.8
Grand Total
87.5
181.9625
Final considerations
The scenarios we have described are useful only when the use of relationships is not possible. This can be the case of a particular complex condition to relate two tables, or simply it could be hard or time wasting to generate the necessary intermediate tables to define the relationships between tables containing real data.
From a performance point of view, it is always better to define a model that solve the problem leveraging relationships, using DAX just to make calculation but not to filter data. Using relationships, filtering is implicit in DAX formulas and this helps the engine to produce results faster. However, having a complete toolset to solve real business problems helps you to choose the fastest path to create a working PowerPivot model.
UPDATE: this post seems to qualify for T-SQL Tuesday post!
Returns the sum of an expression evaluated for each row in a table.
SUMX ( <Table>, <Expression> )
Evaluates an expression in a context modified by filters.
CALCULATE ( <Expression> [, <Filter> [, <Filter> [, … ] ] ] )
When a column name is given, returns a single-column table of unique values. When a table name is given, returns a table with the same columns and all the rows of the table (including duplicates) with the additional blank row caused by an invalid relationship if present.
VALUES ( <TableNameOrColumnName> )
Returns a table that has been filtered.
FILTER ( <Table>, <FilterExpression> )
Checks whether all arguments are TRUE, and returns TRUE if all arguments are TRUE.
AND ( <Logical1>, <Logical2> )
Converts a text string that represents a number to a number.
VALUE ( <Text> )
Returns the average (arithmetic mean) of all the numbers in a column.
AVERAGE ( <ColumnName> )