Just a few days before my session about Power Query at TechEd 2014, Microsoft released a new update that enables the scheduled data refresh of a Power Pivot workbook containing Power Query transformations.
This is a very good news, because it enables the data refresh of a number of different data sources. Even if the number of providers supported by this release is limited (only SQL Server and Oracle), you can use a SQL Server database as a bridge to access different data sources through views using Linked Server connections.
If you want to use this feature, first of all read carefully the Scheduled Data Refresh for Power Query blog post on MSDN web site. It guides you through are the steps required in order to enable the data source connection through the Data Management Gateway. As you will see, in reality you need to create the data source connections corresponding to the Power Query databases you use. Thus, in reality you might skip the data source configuration if you already have the corresponding databases enabled in the Power BI admin center. However, I suggest you to go through the steps described in that blog post at the beginning, because if the same database has two different drivers, it needs two different data sources. For this reason, I have a number of notes that might be helpful to avoid certain issues.
- Power Query uses the .NET Framework Data Provider for SQL Server and Oracle Data Provider for .NET, whereas Power Pivot by default creates a SQL Server connection using the SQL Server Native Client 11.0 (SQLNCLI11).
- Even if you already created a data source for a SQL Server database you refresh in a Power Pivot workbook, you have to create another data source for the same SQL Server database for Power Query, because you use two different drivers.
- You might consolidate these data sources to only one, by changing the data provide in the advanced options of a Power Pivot configuration, but I am not sure this is a good idea. I would keep the two version of data sources, one for each provider, in case I use the same database in both connections
- Power Query creates one connection string in Excel for each query you create. The connection string contains the entire transformation and when you copy it in the New Data Source page in Power BI admin, the internal query is analyzed to extract the required connection to SQL Server. If these connections are already configured as Power BI data sources, then you don’t need to do anything else. I suggest you to iterate all the queries you have following this step until you are confident of the internals and you are sure the required data sources are already available.
- Even if you create a single query in M language accessing to different databases, the referenced connections will be found and each database will have a separate data source configuration in Power BI. I was worried that loading multiple tables from different database on the same server would have produced a single data source enabling to access all the databases on the server, but luckily this does not happen and security is preserved!
- I spot an issue using certain DateTimeZone functions (DateTimeZone.FixedLocalNow, DateTimeZone.FixedUtcNow, DateTimeZone.LocalNow, and DateTimeZone.UtcNow) that seem not working with scheduled data refresh. You can read more about such issue in this thread on Power Query MSDN forum. I found a workaround using the Table.Buffer function, so that by stopping query folding the expression is not translated in SQL but evaluated directly by the Power Query engine. However, I hope this will be fixed soon.
- A Power Query transformation that contains only a script, without accessing to any data source, currently is not refreshed. This would be useful for generating a Date table, I opened this other thread about this issue on the forum, I hope there will be news on that, too.
- In the same thread you will find another tip: the literal in the form #literal, such as #table, are being mis-analyzed by scheduled refresh, but at least for this issue there are workarounds available, until the issue is not fixed by Microsoft.
- You can use SQL Server views based on linked servers to overcome the limitation of providers currently supported by Data Management Gateway (which is the component used by scheduled data refresh).
- Now that it is possible to publish SSIS packages as OData Feed Sources, you can expose a SQL Server view to Power BI, and accessing it from Power Pivot or Power Query, you can execute SSIS packages at refresh time. If the package is not too long to execute (it would timeout the connection), this is a smart way to arrange execution of some small “corporate ETL” in sync with the data refresh on Power BI, without relying on synchronized scheduling dates (which is always one more thing to maintain). This further extends the range of providers you can use with scheduled data refresh.
I would like to get more detailed errors when something goes wrong and scheduled data refresh stops, but this is a good start.