In DAX you don’t have the LIKE operator and you have to use SEARCH instead. However, performance are not very good and it is better to use LEFT and RIGHT if you just need to compare the initial (or ending) match of a string.
I just wrote an article about string comparison in DAX. During my exploration of LIKE replacement functions in DAX, I have found that documentation of SEARCH for PowerPivot v1 is wrong – in DAX the SEARCH function is always case-insensitive, whereas FIND is always case-sensitive. Moreover, in PowerPivot v1 you had to use IFERROR to catch the string not found condition. This was a big issue in performance and in PowerPivot v2 (and in BISM Tabular) the new SEARCH and FIND functions have a fourth parameter that specify the value that should be returned when a match is not found. Using FIND seems to be 10% faster than SEARCH, just because it is case-insensitive.
I’d like to thank Marius Dumitru for the feedback he provided me in this analysis. Here are a few best practices he also suggest:
- Always avoid IFERROR and ISERROR (everywhere, not just for string functions).
- Use LEFT instead of FIND/SEARCH for expressing “starts with” conditional expressions on strings
- Use FIND instead of SEARCH if your comparison is (or can be) case-sensitive.
- When FIND or SEARCH are needed, use their new flavors taking a 4th parameter (added in SQL11)
If you are curious to see a few performance numbers, at the end of the post I includes the timing for a few query over a Tabular model based on an IISLog table. I included a best-practice table conversion in my article. If you think that a regular expression function would be useful in DAX, you can vote this suggestion on Connect.
— Each DAX query is preceded by a comment with total execution time
— The original condition over IISLog[cIp] column is this one in SQL:
— WHERE cIp NOT LIKE ‘217.57.131.11%’
— AND cIp NOT LIKE ‘11.22.33.3%’
— AND cIp NOT LIKE ‘127.0.0.1’
— AND cIp NOT LIKE ‘192.168.%’
—
— Other conditions tested in a few queries are:
— AND csUriStem LIKE ‘%SQLBI%Methodology%at%work%pdf’
— AND csUserAgent NOT LIKE ‘%bot%’
— 28 seconds
EVALUATE
ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),
FILTER( ALL(IISLog[csUserAgent]),
SEARCH( “bot”, IISLog[csUserAgent], 1, 0 ) = 0
),
FILTER( ALL(IISLog[csUriStem]),
SEARCH( “SQLBI*Methodology*at*work*pdf”, IISLog[csUriStem], 1, 0 ) <> 0
),
[cIp] <> “127.0.0.1”
&& LEFT( [cIp], 12 ) <> “11.22.33.3”
&& LEFT( [cIp], 8 ) <> “192.168.”
&& LEFT( [cIp], 13 ) <> “217.57.131.11”
)
)
— 15 seconds
EVALUATE
ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),
FILTER( ALL(IISLog[csUserAgent]),
SEARCH( “bot”, IISLog[csUserAgent], 1, 0 ) = 0
)
)
— 11 seconds
EVALUATE
ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),
FILTER( ALL(IISLog[csUriStem]),
SEARCH( “SQLBI*Methodology*at*work*pdf”, IISLog[csUriStem], 1, 0 ) <> 0
)
)
— 9 seconds
EVALUATE
ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),
FILTER( ALL(IISLog[csUriStem]),
SEARCH( “SQLBI”, IISLog[csUriStem], 1, 0 ) <> 0
)
)
— 54 seconds
EVALUATE
ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),
IFERROR( SEARCH( “217.57.131.11”, IISLog[cIp] ), -1 ) < 0
&& IFERROR( SEARCH( “11.22.33.3”, IISLog[cIp] ), -1 ) < 0
&& IFERROR( SEARCH( “127.0.0.1”, IISLog[cIp] ), -1 ) < 0
&& IFERROR( SEARCH( “192.168.”, IISLog[cIp] ), -1 ) < 0
)
)
— 16 seconds
EVALUATE
ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),
SEARCH( “217.57.131.11”, IISLog[cIp], 1, 0 ) = 0
&& SEARCH( “11.22.33.3”, IISLog[cIp], 1, 0 ) = 0
&& SEARCH( “127.0.0.1”, IISLog[cIp], 1, 0 ) = 0
&& SEARCH( “192.168.”, IISLog[cIp], 1, 0 ) = 0
)
)
— 14 seconds
EVALUATE
ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),
FIND( “217.57.131.11”, IISLog[cIp], 1, 0 ) = 0
&& FIND( “11.22.33.3”, IISLog[cIp], 1, 0 ) = 0
&& FIND( “127.0.0.1”, IISLog[cIp], 1, 0 ) = 0
&& FIND( “192.168.”, IISLog[cIp], 1, 0 ) = 0
)
)
— 12 seconds
EVALUATE ROW( “result”,
FILTER( DISTINCT(IISLog[cIp]),
[cIp] <> “127.0.0.1”
&& SEARCH( “11.22.33.3”, [cIp], 1, 0 ) <> 1
&& SEARCH( “192.168.”, [cIp], 1, 0 ) <> 1
&& SEARCH( “217.57.131.11”, [cIp], 1, 0 ) <> 1
)
)
)
— 12 seconds
EVALUATE
ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),
[cIp] <> “127.0.0.1”
&& SEARCH( “11.22.33.3”, [cIp], 1, 0 ) <> 1
&& SEARCH( “192.168.”, [cIp], 1, 0 ) <> 1
&& SEARCH( “217.57.131.11”, [cIp], 1, 0 ) <> 1
)
)
— 2 seconds
EVALUATE
ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),
[cIp] <> “127.0.0.1”
&& LEFT( [cIp], 12 ) <> “11.22.33.3”
&& LEFT( [cIp], 8 ) <> “192.168.”
&& LEFT( [cIp], 13 ) <> “217.57.131.11”
)
)
Returns the starting position of one text string within another text string. SEARCH is not case-sensitive, but it is accent-sensitive.
SEARCH ( <FindText>, <WithinText> [, <StartPosition>] [, <NotFoundValue>] )
Returns the specified number of characters from the start of a text string.
LEFT ( <Text> [, <NumberOfCharacters>] )
Returns the specified number of characters from the end of a text string.
RIGHT ( <Text> [, <NumberOfCharacters>] )
Returns the starting position of one text string within another text string. FIND is case-sensitive and accent-sensitive.
FIND ( <FindText>, <WithinText> [, <StartPosition>] [, <NotFoundValue>] )
Returns value_if_error if the first expression is an error and the value of the expression itself otherwise.
IFERROR ( <Value>, <ValueIfError> )
Checks whether a value is an error, and returns TRUE or FALSE.
ISERROR ( <Value> )
Checks whether all arguments are TRUE, and returns TRUE if all arguments are TRUE.
AND ( <Logical1>, <Logical2> )
Evaluates a table expression in a context modified by filters.
CALCULATETABLE ( <Table> [, <Filter> [, <Filter> [, … ] ] ] )
Returns a single row table with new columns specified by the DAX expressions.
ROW ( <Name>, <Expression> [, <Name>, <Expression> [, … ] ] )
Counts the number of distinct values in a column.
DISTINCTCOUNT ( <ColumnName> )
Returns a table that has been filtered.
FILTER ( <Table>, <FilterExpression> )
Returns all the rows in a table, or all the values in a column, ignoring any filters that might have been applied.
ALL ( [<TableNameOrColumnName>] [, <ColumnName> [, <ColumnName> [, … ] ] ] )
Counts the number of rows in a table.
COUNTROWS ( [<Table>] )
Returns a one column table that contains the distinct (unique) values in a column, for a column argument. Or multiple columns with distinct (unique) combination of values, for a table expression argument.
DISTINCT ( <ColumnNameOrTableExpr> )