UPDATE 2021-06-29: also read Choosing Azure Analysis Services or Power BI Premium for large datasets – SQLBI containing updates about Power BI Premium Gen2.
I often get questions about use cases for Azure Analysis Services. So I wanted to do a quick recap of the points I commonly discuss in these conversations, and have them readily available for the next conversation… As a bonus, I get to share them with a larger audience!
The Tabular semantic model was introduced to a large number of Power BI users, and they take advantage of the Analysis Services technology starting with the free Power BI Desktop. Indeed, the same engine runs Power BI, Power Pivot for Excel, SQL Server Analysis Services, and Azure Analysis Services. However, when dealing with a large dataset, you want to use the engine on a separate backend infrastructure. By large I mean databases with one or more tables over 200 million rows. The database size is just a consequence of the table size. Many of this article’s considerations do not apply to large models that just have many columns and/or many unique values in smaller tables.
Today your backend options for a Tabular model are SQL Server Analysis Services, Azure Analysis Services, and Power BI Premium.
The legacy: SQL Server Analysis Services
When choosing SQL Server Analysis Services, you must manage all the details about hardware sizing, setup, deployment, configuration, and service upgrades. While this is a common choice in many companies, I also see Analysis Services deployed on large virtual machines that run on a server infrastructure not optimized for Analysis Services. To this day I still see big mistakes being made that could be avoided by following the suggestions in these articles published in 2015:
- Choosing the Right Hardware for Analysis Services Tabular
- Optimizing Hardware Settings for Analysis Services Tabular
Ignoring hardware sizing and optimization is already a good reason to consider a move to cloud services, so you can ignore these problems and focus on your data model instead. I would add another important reason like automatic updates of the engine. Getting new builds every other week means that you get new features and performance optimizations automatically. In my experience, this alone represents great . I have seen too many companies where the version installed was many years old and not regularly updated. This caused them to call me for some issue that had already been resolved months or years prior, in a cumulative update.
The present: Azure Analysis Services
Once you move to the cloud using Azure Analysis Services, you should pay attention to a few features/settings that could be important for large data models.
- Memory: the memory you have for the service tier includes both the databases you have online, and the temporary structures required to process the database. You can compare this parameter to the RAM of a server: there should be enough RAM to store the compressed data and any other memory structure of Analysis Services, including the query cache and processing buffers. Having a large number of users connected at the same time can increase the memory demand.
- Cost control: you can pause the services, paying only for the time you keep the database online. You can also change the service tier dynamically, increasing RAM and processing power to speed up the refresh, or reducing the performance if pausing the service is not an option but you want to save money when dealing with low workloads.
- Scale-out control: Azure Analysis Services provides control over a number of query replicas to manage a large number of users connected simultaneously. This is not easy to implement in SQL Server Analysis Services and requires a big administrative effort.
- Segment size control: the DefaultSegmentRowSize property can be important in large data models and it is the same property available in SQL Server Analysis Services. The default in Analysis Services is 8M rows and it is a good balance. Reducing the segment size also reduces the memory pressure at processing time, especially for tables with a large number of columns. However, a smaller number can also affect the performance at query time for tables with hundreds of millions of rows. I see companies reducing this setting to avoid out-of-memory errors during processing, but this is only a good idea for tables with 15-20 million rows. The recommendation is to avoid reducing this number for tables with more than 100 million rows.
My experience is that Azure Analysis Services can be a good choice for large data models. You certainly have to pay attention to the service tier, which corresponds to the hardware choice in the on-premises world. The article Choosing tier in Azure Analysis Services can the RAM and CPU you need for your service.
The future: Power BI Premium
The long-term goal for Microsoft is to provide a superset of Azure Analysis Services features in Power BI Premium. A critical feature to enable this scenario is the availability of XMLA read/write endpoint. XMLA read/write endpoint enables access to the TOM API from a client tool and makes the development tools available for Analysis Services also available for Power BI Premium. This also exposes all the features of the Tabular semantic model through the TOM API, without being limited to those features exposed in the user interface of Power BI Desktop.
As soon as XMLA read/write endpoint is publicly available, you can consider moving your workload to Power BI Premium with the same considerations previously described in this article. Moreover, you will also be able to use these additional features not available in Azure Analysis Services:
- Composite models: Power BI can create models where you can mix several data sources in DirectQuery mode and use VertiPaq as a cache. While I think that keeping the data in memory in VertiPaq is always the best choice whenever possible, sometimes it is not possible. A DirectQuery model in Azure Analysis Services can have a single data source.
- Aggregations: this feature is now related to composite models only, so we do not have aggregations over large tables in memory (VertiPaq). I hope Microsoft will one day extend aggregations to VertiPaq models.
- Incremental refresh: this is a Power BI feature that simplifies the task required to implement an incremental refresh. Azure Analysis Services can be controlled in a similar way, but you have to write a bit of script.
- Model eviction: a model is evicted from memory when it is not being used, and it is dynamically loaded in memory when a needs it. This highly optimizes the memory usage, introducing some latency for models that are not used often and could be evicted frequently.
Clearly, there are many other features in Power BI Premium, but this post was focused only on features related to large datasets.