I am proud to announce the second version of the Contoso Data Generator!
In January 2022, we released the first version of an open-source project to create a sample relational database for semantic models in Power BI and Analysis Services. That version focused on creating a SQL Server database as a starting point for the semantic model.
We invested in a new version to support more scenarios and products! Yes, Power BI is our primary focus, but 90% of our work could have been helpful for other platforms and architectures, so… why not?
The result is an open-source project that generates data for a fictitious company (Contoso) by controlling several parameters for data distribution. We used this tool to create several ready-to-use sets of data that you can just download. This is the quickest way to get a complete data set ranging from 10,000 to 100,000,000 orders, depending on the volume you need for your tests or demos.
As in the previous version, we started from the Microsoft Contoso sample database and added stores and customers obtained by random data generation services. The transactions generated in the Sales table are not completely random: you can control the distribution of transactions over products, customers, stores, and time, generating different databases with specific trends and exceptions for your demos.
Ready to use sets of data
You can download a ready-to-use set of data by choosing from the many combinations of format and size available.
Formats supported:
- bak: backup files for SQL Server databases
- csv: files in CSV format
- delta: files in Delta Table format
- parquet: files in parquet format
- pbit: template file to import SQL Server database in a Power BI Desktop model
- pbix: files in Power BI Desktop format
The size is measured in an approximate number of orders, which corresponds to a larger (around 2x) number of transactions (rows in the Sales table).
If you want to upload the files to a Fabric Lakehouse, use one of the several techniques described in the documentation.
The tables available are Customers, Stores, Dates, CurrencyExchanges, Sales, Orders, and OrderRows. Sales is just the denormalized version of Orders and OrderRows: we decided to keep both versions to support a larger number of demos – like showing the impact of a header-detail structure (Orders–OrderRows) in a model and comparing that with a regular star schema (Sales).
Create your customized version of Contoso
You can download the tool executable, customize a few text configuration files, and then run a command line script to generate the files in CSV, Parquet, or Delta Table format. Optionally, you can import the CSV files into Microsoft SQL Server using specific SQL scripts. No knowledge of SQL or C# is required, just use the existing scripts and tools!
Customize the C# code of Contoso Data Generator V2
The Contoso Data Generator is a C# open-source tool available on GitHub with an MIT License, which is pretty flexible! However, if you want to add features, please consider creating a pull request so other users can benefit from your changes!