SQL: Merge two "history tracking" tables

If you have two columns in a table that “remember” their old values in two different historical tables, as it is the case in AdventureWorks with EmployeePayHistory and EmployeeDepartmentHistory, you may need (or just wonder how) to merge these two tables into only one historical table that will keep track of both columns.

This has been my “problem of the day” and, after some thoughts, I came up with a pattern of solution that I’d like to share with you both to see if you have a better solution and to have the code at hand when I’ll need later. :)

Before using the real table, I made up a simple test case with a Product table with only two fields a couple of variation tables (VariationA and VariationB):

-- --  Sample table Products, only two columns A and B. The current value of the row is (D, 4) -- CREATE TABLE Products (     A CHAR (1),     B INT ) INSERT INTO products (A, B) VALUES ('D', 4)  -- --  Variations for column A -- --      ------------------ --      Year    Old Value --      ------------------ --      2001        A --      2002        B --      2003        C --      2005        D -- CREATE TABLE VariationA (        Position INT,     OldValue VARCHAR (5),     ChangeDate DATETIME )  INSERT VariationA (Position, OldValue, ChangeDate) VALUES (1, 'A', '01/01/2001') INSERT VariationA (Position, OldValue, ChangeDate) VALUES (2, 'B', '01/01/2002') INSERT VariationA (Position, OldValue, ChangeDate) VALUES (3, 'C', '01/01/2003') INSERT VariationA (Position, OldValue, ChangeDate) VALUES (4, 'D', '01/01/2005')  -- --  Variations for column B -- --      ------------------ --      Year    Old Value --      ------------------ --      2000        1 --      2003        2 --      2004        3 -- CREATE TABLE VariationB (        Position INT,     OldValue INT,     ChangeDate DATETIME )  INSERT VariationB (Position, OldValue, ChangeDate) VALUES (1, 1, '01/01/2000') INSERT VariationB (Position, OldValue, ChangeDate) VALUES (2, 2, '01/01/2003') INSERT VariationB (Position, OldValue, ChangeDate) VALUES (3, 3, '01/01/2004')

The query that mixes all these variations into a single table is pretty easy but it took me some time to discover. My solution is to detect what was the value of “B” when a variation in “A” happened and vice versa. As the values stored are “old” values, we know that the value of A at a certain date is the value stored in the first variation for A AFTER that date. The only special case is the current time: if no variation record is found then we know that the value to use is the current value of the record in the product table. The same is obviously true for B.

I ended up with this pattern query:

-- --  AllEvents contains all the events from both VariationA and VariationB, --  and will detect, for each variation that happened, what was --  the value of the other column at that time, building in this way the --  merged variation list. --  The final SELECT will return the sorted and DISTINCTed result.  --  Then final COALESCE is needed because if a value is NULL it  --  means that it should contain the "current" value of the column -- WITH AllEvents AS (  SELECT          OldValueOfA  = Events.OldValueOfA, -- Old value of A         OldValueOfB  = Events.OldValueOfB, -- Old value of B         ChangeDate   = Events.ChangeDate      -- Date of change  FROM              (SELECT                  OldValueOfA = OldValue,                  OldValueOfB = (SELECT TOP 1 OldValue                                   FROM VariationB V                                 WHERE V.ChangeDate >= VariationA.ChangeDate                                 ORDER BY ChangeDate),                  ChangeDate              FROM VariationA          UNION ALL              SELECT                  OldValueOfA = (SELECT TOP 1 OldValue                                   FROM VariationA V                                 WHERE V.ChangeDate >= VariationB.ChangeDate                                 ORDER BY ChangeDate),                  OldValueOfB = OldValue,                  ChangeDate              FROM VariationB         ) Events     ) SELECT DISTINCT     ChangeDate   = ChangeDate,     OldValueOfA  = COALESCE (OldValueOfA, (SELECT A FROM Products)),     OldValueOfB  = COALESCE (OldValueOfB, (SELECT B FROM Products)) FROM     AllEvents ORDER By ChangeDate

Clearly, i could have written some VB code in SSIS to solve the same problem. Nevertheless, having a pattern query at hand is useful because the pattern can be easily adapted to any real world situation with only SQL Management Studio at hand. The following code is the implementation of the pattern for AdventureWorks to merge Department and Payment history for Employees:

WITH AllEvents AS (  SELECT          EmployeeID      = Events.EmployeeID,         OldValueOfRate  = Events.OldValueOfRate,           OldValueOfDep   = Events.OldValueOfDep,            ChangeDate      = Events.ChangeDate          FROM              (SELECT                  EmployeeID      = EmployeeID,                 OldValueOfRate  = Rate,                  OldValueOfDep   = (SELECT TOP 1 DepartmentID                                      FROM HumanResources.EmployeeDepartmentHistory V                                     WHERE V.StartDate >= PayHistory.RateChangeDate                                       AND V.EmployeeID = PayHistory.EmployeeID                                     ORDER BY V.StartDate),                  ChangeDate      = RateChangeDate              FROM HumanResources.EmployeePayHistory PayHistory         UNION ALL              SELECT                  EmployeeID      = EmployeeID,                 OldValueOfRate  = (SELECT TOP 1 Rate                                      FROM HumanResources.EmployeePayHistory V                                     WHERE V.RateChangeDate >= DepHistory.StartDate                                       AND V.EmployeeID = DepHistory.EmployeeID                                     ORDER BY V.RateChangeDate),                  OldValueOfDep   = DepHistory.DepartmentID,                  ChangeDate      = StartDate             FROM HumanResources.EmployeeDepartmentHistory DepHistory         ) Events     ) SELECT DISTINCT     EmployeeID     = EmployeeID,     ChangeDate     = ChangeDate,     OldValueOfDep  = COALESCE (                          OldValueOfDep,  (SELECT DepartmentID                             FROM HumanResources.EmployeeDepartmentHistory DepHistory                           WHERE EndDate IS NULL                              AND DepHistory.EmployeeID = AllEvents.EmployeeID)),     OldValueOfRate = COALESCE (                          OldValueOfRate,  (SELECT TOP 1 Rate                            FROM HumanResources.EmployeePayHistory PayHistory                           ORDER BY RateChangeDate))                      FROM     AllEvents ORDER By EmployeeID, ChangeDate

The code can be easily checked filtering for a single EmployeeID (4 is a good candidate for this check). Please note that – in this case – the “current value” is kept in the history table and so the final COALESCE is a bit more intricated but still pretty easy both to write and to understand.

If you have a better pattern for this kind of situation or any comments on it… I’ll be glad to read your comments.