During the last PASS Summit I and Alberto Ferrari had long discussions at SQL Clinic with some of the developers of our loved tools. Sometime you really have to dig in the internals of certain feature to understand why there are some “unexpected” behaviors for certain functions. One of the discussions was about SUMMARIZE. This function can be very powerful (after all, it’s a way to do a join between related tables…) but also very dangerous, because of the way it implements its logic (especially for the ROLLUP condition).
The rule of thumb, that we already mentioned in the past, is to use SUMMARIZE only as a way to execute a sort of SELECT DISTINCT, and not to create column to aggregate values. Use ADDCOLUMNS for this other job. We previously mentioned mainly performance reasons for that, but now we have a more complete description of why you should avoid SUMMARIZE for computing aggregations: you might obtain a different result than the one expected. The complete discussion of the issue and of the workarounds is included in the new article All the secrets of SUMMARIZE written by Alberto Ferrari.
Creates a summary of the input table grouped by the specified columns.
SUMMARIZE ( <Table> [, <GroupBy_ColumnName> [, [<Name>] [, [<Expression>] [, <GroupBy_ColumnName> [, [<Name>] [, [<Expression>] [, … ] ] ] ] ] ] ] )
Identifies a subset of columns specified in the call to SUMMARIZE function that should be used to calculate subtotals.
ROLLUP ( <GroupBy_ColumnName> [, <GroupBy_ColumnName> [, … ] ] )
Returns a one column table that contains the distinct (unique) values in a column, for a column argument. Or multiple columns with distinct (unique) combination of values, for a table expression argument.
DISTINCT ( <ColumnNameOrTableExpr> )
Returns a table with new columns specified by the DAX expressions.
ADDCOLUMNS ( <Table>, <Name>, <Expression> [, <Name>, <Expression> [, … ] ] )