UPDATE 2021-02-25: this article is valid only for legacy data sources in Analysis Services. If you use an M data source, you have to override the data source.
In Analysis Services you can process a table in a tabular model using several techniques: you can process the whole table, you can split the table in several partitions and process a single partition, you can merge partitions, and you can incrementally process a single partition by using ProcessAdd. This article is an updated version of the original Incremental Processing in Tabular using Process Add, which is still valid if you have to manage tabular models in compatibility version between 1050 and 1103. The techniques described in this article can be applied only to model compatibility levels 1200 or higher.
The ProcessAdd command allows you to add only a few rows to an existing partition and to do that you have to specify the query that reads data from the data source, applying the necessary WHERE condition in a SQL query or by using any other SQL statement to this purpose. The ProcessAdd command available in Process Partition(s) dialog box in SQL Server Management Studio (SSMS) does not allow you to specify a custom query for the process operation, in order to filter only new rows that have to be added to the partition. However, this is not a big issue: in fact, if you need to use ProcessAdd, probably you need to automate that command in a batch process. Thus, a programmatic approach is required. A TMSL script command is required and you will see how to programmatically obtain it by using TOM and PowerShell.
It is out of scope of this article describing how you should define the SQL command that only returns the new rows to be added to the table in a tabular model. Remember that it is your responsibility avoiding duplicate rows in the destination table. There is no automatic detection of duplicates and if the table does not have unique columns you would obtain row duplicates in your table as a result, otherwise the process operation will stop with an error if a unique condition for a column is violated by loading new data.
Technically, when a ProcessAdd runs, Analysis Services internally creates a new partition, processes the whole new partition, and then merges it to the target partition (the one on which ProcessAdd command has been executed). You can use this same approach by using separate operations, but ProcessAdd can be more optimized for this specific activity.
In the following sections you will see how to execute and automate ProcessAdd by using different tools.
ProcessAdd with TMSL Script
In order to process a table partition in Tabular, you have to issue a Process Add command to the target partition of a table. The overrides section replaces the existing query binding on the partition with a different source query, which will be used just for this process command. For example, the following script executes a process add operation over the partition Sales of the table Sales in the Contoso database.
{ "refresh":{ "type":"add", "objects":[ { "database":"Contoso", "table":"Sales", "partition":"Sales" } ], "overrides":[ { "partitions":[ { "originalObject":{ "database":"Contoso3", "table":"Sales", "partition":"Sales" }, "source":{ "query":"SELECT * FROM [Analytics].[Sales] WHERE [Order Date] >= '20170211'" } } ] } ] } }
The source query element contains the query that will be used by the ProcessAdd operation. You can use any valid SQL statement here that will produce the same columns required by the partition that you are going to incrementally update. In the example, a SELECT statement has been used, filtering all the orders having an order date greater than or equal to February 11th, 2017. It is up to you to define a safe filter condition and to avoid loading the same rows multiple times (for example, with this filter condition if an order has a date of February 12th and the day after you filter orders greater than or equal to February 12th, that order row will be loaded twice).
ProcessAdd with Integration Services
At the moment of writing, SQL Server Integration Services does not provide a direct support to the Process Add operation for models in compatibility level 1200 or higher. The Analysis Services Processing Task component only supports other process operation over tables and partitions of a tabular model, but you cannot override the source query, and the Process Add option is not available in refresh types of this component.
ProcessAdd with TOM
Using the Analysis Management Objects (AMO) and TOM (Tabular Object Model) libraries, you can generate the same Process Add command you have seen in TMSL Script. When you execute the RequestRefresh method, you can provide a collection of objects that can override certain settings of the original model, such as the source query to use for reading the rows to add to the partition.
using System.Collections.Generic; using Microsoft.AnalysisServices.Tabular; using Microsoft.AnalysisServices.Tabular.DataRefresh; namespace ListTables { class Program { static void Main(string[] args) { string serverName = @"gap\tabular"; string databaseName = "Contoso"; Server server = new Server(); server.Connect(serverName); Database db = server.Databases[databaseName]; Table tableSales = db.Model.Tables["Sales"]; Partition partition = tableSales.Partitions[0]; OverrideCollection oc = new OverrideCollection { Partitions = { new PartitionOverride { OriginalObject = partition, Source = new QueryPartitionSourceOverride { DataSource = ((QueryPartitionSource)partition.Source).DataSource, Query = "SELECT * FROM [Analytics].[Sales] WHERE [Order Date] >= '20170211'" } } } }; var listOc = new List<OverrideCollection>(); listOc.Add(oc); partition.RequestRefresh( RefreshType.Add, listOc ); db.Model.SaveChanges(); } } }
ProcessAdd with PowerShell
Once you know how to create the desired process command with AMO and TOM, you can translate that code in a PowerShell script. For example, the C# code you have seen in the previous section can be translated into the following PowerShell script:
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.AnalysisServices.Tabular") $server = New-Object Microsoft.AnalysisServices.Tabular.Server $server.Connect("localhost\tabular") $db = $server.Databases["Contoso"] $model = $db.Model $tableSales = $model.Tables["Sales"] $partition = $tableSales.Partitions[0] $source = New-Object Microsoft.AnalysisServices.Tabular.DataRefresh.QueryPartitionSourceOverride $source.DataSource = $partition.Source.DataSource $source.Query = "SELECT * FROM [Analytics].[Sales] WHERE [Order Date] >= '20170211'" $po = New-Object Microsoft.AnalysisServices.Tabular.DataRefresh.PartitionOverride $po.OriginalObject = $partition $po.Source = $source $oc = New-Object Microsoft.AnalysisServices.Tabular.DataRefresh.OverrideCollection $oc.Partitions.Add($po) $listoc = New-Object "System.Collections.Generic.List[Microsoft.AnalysisServices.Tabular.DataRefresh.OverrideCollection]" $listoc.Add($oc) $partition.RequestRefresh("Add", $listoc ) $model.SaveChanges();
Conclusion
Incremental processing of tables in a tabular model is possible, but it requires specific commands that are not well documented on MSDN. This article shown several ways to execute an incremental Process Add command on a table in a tabular model, providing a temporary query binding that identifies only the rows to load in the ProcessAdd batch, without changing underlying tabular structure or views in SQL Server. You can use the same pattern, choosing the technique that better adapts to your periodic process batch operation.