If you have two columns in a table that “remember” their old values in two different historical tables, as it is the case in AdventureWorks with EmployeePayHistory and EmployeeDepartmentHistory, you may need (or just wonder how) to merge these two tables into only one historical table that will keep track of both columns.
This has been my “problem of the day” and, after some thoughts, I came up with a pattern of solution that I’d like to share with you both to see if you have a better solution and to have the code at hand when I’ll need later. :)
Before using the real table, I made up a simple test case with a Product table with only two fields a couple of variation tables (VariationA and VariationB):
-- -- Sample table Products, only two columns A and B. The current value of the row is (D, 4) -- CREATE TABLE Products ( A CHAR (1), B INT ) INSERT INTO products (A, B) VALUES ('D', 4) -- -- Variations for column A -- -- ------------------ -- Year Old Value -- ------------------ -- 2001 A -- 2002 B -- 2003 C -- 2005 D -- CREATE TABLE VariationA ( Position INT, OldValue VARCHAR (5), ChangeDate DATETIME ) INSERT VariationA (Position, OldValue, ChangeDate) VALUES (1, 'A', '01/01/2001') INSERT VariationA (Position, OldValue, ChangeDate) VALUES (2, 'B', '01/01/2002') INSERT VariationA (Position, OldValue, ChangeDate) VALUES (3, 'C', '01/01/2003') INSERT VariationA (Position, OldValue, ChangeDate) VALUES (4, 'D', '01/01/2005') -- -- Variations for column B -- -- ------------------ -- Year Old Value -- ------------------ -- 2000 1 -- 2003 2 -- 2004 3 -- CREATE TABLE VariationB ( Position INT, OldValue INT, ChangeDate DATETIME ) INSERT VariationB (Position, OldValue, ChangeDate) VALUES (1, 1, '01/01/2000') INSERT VariationB (Position, OldValue, ChangeDate) VALUES (2, 2, '01/01/2003') INSERT VariationB (Position, OldValue, ChangeDate) VALUES (3, 3, '01/01/2004')
The query that mixes all these variations into a single table is pretty easy but it took me some time to discover. My solution is to detect what was the value of “B” when a variation in “A” happened and vice versa. As the values stored are “old” values, we know that the value of A at a certain date is the value stored in the first variation for A AFTER that date. The only special case is the current time: if no variation record is found then we know that the value to use is the current value of the record in the product table. The same is obviously true for B.
I ended up with this pattern query:
-- -- AllEvents contains all the events from both VariationA and VariationB, -- and will detect, for each variation that happened, what was -- the value of the other column at that time, building in this way the -- merged variation list. -- The final SELECT will return the sorted and DISTINCTed result. -- Then final COALESCE is needed because if a value is NULL it -- means that it should contain the "current" value of the column -- WITH AllEvents AS ( SELECT OldValueOfA = Events.OldValueOfA, -- Old value of A OldValueOfB = Events.OldValueOfB, -- Old value of B ChangeDate = Events.ChangeDate -- Date of change FROM (SELECT OldValueOfA = OldValue, OldValueOfB = (SELECT TOP 1 OldValue FROM VariationB V WHERE V.ChangeDate >= VariationA.ChangeDate ORDER BY ChangeDate), ChangeDate FROM VariationA UNION ALL SELECT OldValueOfA = (SELECT TOP 1 OldValue FROM VariationA V WHERE V.ChangeDate >= VariationB.ChangeDate ORDER BY ChangeDate), OldValueOfB = OldValue, ChangeDate FROM VariationB ) Events ) SELECT DISTINCT ChangeDate = ChangeDate, OldValueOfA = COALESCE (OldValueOfA, (SELECT A FROM Products)), OldValueOfB = COALESCE (OldValueOfB, (SELECT B FROM Products)) FROM AllEvents ORDER By ChangeDate
Clearly, i could have written some VB code in SSIS to solve the same problem. Nevertheless, having a pattern query at hand is useful because the pattern can be easily adapted to any real world situation with only SQL Management Studio at hand. The following code is the implementation of the pattern for AdventureWorks to merge Department and Payment history for Employees:
WITH AllEvents AS ( SELECT EmployeeID = Events.EmployeeID, OldValueOfRate = Events.OldValueOfRate, OldValueOfDep = Events.OldValueOfDep, ChangeDate = Events.ChangeDate FROM (SELECT EmployeeID = EmployeeID, OldValueOfRate = Rate, OldValueOfDep = (SELECT TOP 1 DepartmentID FROM HumanResources.EmployeeDepartmentHistory V WHERE V.StartDate >= PayHistory.RateChangeDate AND V.EmployeeID = PayHistory.EmployeeID ORDER BY V.StartDate), ChangeDate = RateChangeDate FROM HumanResources.EmployeePayHistory PayHistory UNION ALL SELECT EmployeeID = EmployeeID, OldValueOfRate = (SELECT TOP 1 Rate FROM HumanResources.EmployeePayHistory V WHERE V.RateChangeDate >= DepHistory.StartDate AND V.EmployeeID = DepHistory.EmployeeID ORDER BY V.RateChangeDate), OldValueOfDep = DepHistory.DepartmentID, ChangeDate = StartDate FROM HumanResources.EmployeeDepartmentHistory DepHistory ) Events ) SELECT DISTINCT EmployeeID = EmployeeID, ChangeDate = ChangeDate, OldValueOfDep = COALESCE ( OldValueOfDep, (SELECT DepartmentID FROM HumanResources.EmployeeDepartmentHistory DepHistory WHERE EndDate IS NULL AND DepHistory.EmployeeID = AllEvents.EmployeeID)), OldValueOfRate = COALESCE ( OldValueOfRate, (SELECT TOP 1 Rate FROM HumanResources.EmployeePayHistory PayHistory ORDER BY RateChangeDate)) FROM AllEvents ORDER By EmployeeID, ChangeDate
The code can be easily checked filtering for a single EmployeeID (4 is a good candidate for this check). Please note that – in this case – the “current value” is kept in the history table and so the final COALESCE is a bit more intricated but still pretty easy both to write and to understand.
If you have a better pattern for this kind of situation or any comments on it… I’ll be glad to read your comments.
Rounds a number down to the nearest integer.
INT ( <Number> )
When a column name is given, returns a single-column table of unique values. When a table name is given, returns a table with the same columns and all the rows of the table (including duplicates) with the additional blank row caused by an invalid relationship if present.
VALUES ( <TableNameOrColumnName> )
Returns the first argument that does not evaluate to a blank value. If all arguments evaluate to blank values, BLANK is returned.
COALESCE ( <Value1>, <Value2> [, <Value2> [, … ] ] )
Returns the union of the tables whose columns match.
UNION ( <Table>, <Table> [, <Table> [, … ] ] )
Returns all the rows in a table, or all the values in a column, ignoring any filters that might have been applied.
ALL ( [<TableNameOrColumnName>] [, <ColumnName> [, <ColumnName> [, … ] ] ] )
Returns a one column table that contains the distinct (unique) values in a column, for a column argument. Or multiple columns with distinct (unique) combination of values, for a table expression argument.
DISTINCT ( <ColumnNameOrTableExpr> )
Checks whether all arguments are TRUE, and returns TRUE if all arguments are TRUE.
AND ( <Logical1>, <Logical2> )