First of all, a praise to Ralph Kimball: in the last days I read for the second time his The Data Warehouse Toolkit and this second read has been very inspiring to me. This time I’ve had the perspective of Analysis Services 2005 (SSAS) as OLAP target for data marts, and I have to say that many of the features of SSAS are probably inspired by this same book; when I first read the book, I had too much AS2000 limitations in my head and almost immediately forgot some design pattern. I strongly suggest reading this book, because if you interpret it in the right way you can clearly figure out the right way to use the so many new dimensional features of AS2005.
One thing that surprised me is that I always thought that hierarchy-modeling and dimensional-modeling was nearly the same concept, but I clearly was wrong and this book never sustained such a thesis! I was wrong because AS2000 distorted my perception of right modeling!
This book is definitely a guide to attribute-based dimensional modeling (even if the author never uses this definition). And this is the key to really leverage the potential of SSAS.
Now, the real intention for this post was to make some consideration (and to pose some question to the blog readers) about the relation between some relational models and the corresponding right design in SSAS 2005.
JUNK DIMENSION
I currently use junk dimension in the relational model even with AS2000. A junk dimension is a single table with all the significative combination (until the Cartesian product of all members) of different and not correlated indicators of flag. With AS2000 I used to place each flag/indicator in a different dimension (because, you know, these were independent hierarchies).
With SSAS 2005 I have the doubt if it would be better to place a junk dimension table in a single OLAP dimension: each attribute is naturally independent from each other to the user. I can imagine that this also improves aggregation design schema. But what about the user presentation? How to name such a dimension? I instinctively refuse to define a “flags” dimension, that has a meaning only for a geek like us but probably it’s not so meaningful for the user.
MINIDIMENSION
A minidimension table usually contains some attributes of a larger dimension tables just to improve accessibility to data in fact table. Rows are less than rows in large dimension table because only the existing attribute permutations in the real dimension defines the rows for the minidimension table. In other words, it’s a way to avoid the large-dimension pitfall of AS2000.
Now, with SSAS 2005 I was tempted to put any attribute of a large dimension (imagine one table with 2 millions of customers) into the dimension itself; but the fact that this new release has not anymore the scalability problem that affected large dimensions in AS2000, it could be still useful to use minidimensions to improve query performance. I still have not faithful numbers to define a “best practice” way (or to define a tradeoff to help choosing between minidimensions and regular “single” dimensions). Is there someone with more experience in this area? And, again, how to name a minidimension in a meaningful way?
SLOWLY CHANGING DIMENSION
We all already know what a slowly changing dimension (SCD) is, right? If not, it’s well explained in the books on line (BOL).
My concern here is in part related to the use of minidimensions. Ralph Kimball suggests to not use SCD with large dimension, placing attributes interested for SCD in one or more minidimensions. So, if it would be true that minidimensions are beneficial to SSAS 2005 performance, it should be very important to avoid SCD on large dimensions to improve performance.
In the next weeks I will have to test this scenario (SCD on large dimension) on the field and I’m worried to see performance problems (but happy to already know a possible solution). But it would be great to share some experience with some other tester….