Sunday, February 14, 2010

How to use Merge in SQLServer 2008

This is a description of how to use the SQL Server 2008 feature - "Merge ... Using".

The operation here involves 2 Tables.
Tsrc = Source table where the information is downloaded from.
Trgt = Local table, typically our working version of the table.


The real life situation / scenario is that from time to time, Tsrc will be changed. At different times, we need to update our local table Trgt such that it reflects the source table (Tsrc) but also require to keep the records of old data.


The process involve a merge between the local and the source tables and the results are kept in the local table.
- The source table is unchanged.
- The local table has extra columns BASE_RUN_I and Clatest.
- Column BASE_RUN_I keeps track of the runs each time we perform a download and merge from the source database.
- Column Clatest keeps track of whether the record is the latest record or has been modified.


Define the two tables as follows:
CREATE TABLE [dbo].[Tsrc](
[C1] [int] NOT NULL,
[C2] [nchar](3) NULL,
[C3] [date] NOT NULL
) ON [PRIMARY]



CREATE TABLE [dbo].[Trgt](
[BASE_RUN_I] [int] NOT NULL, -- a run id
[C1] [int] NOT NULL,
[C2] [nchar](3) NULL,
[C3] [date] NOT NULL,
[Clatest] [int] NOT NULL -- is 0 if record is old, is 1 if record is the new/latest.
) ON [PRIMARY]


The Merge process actually occurs in a temporary table-space facilitated by a VIEW. The view can be defined as follows:

CREATE VIEW [dbo].[VTemp]
AS
SELECT BASE_RUN_I, C1, C2, C3, Clatest
FROM dbo.Trgt
WHERE (Clatest = 1)



There is also a temporary table called UpdateRecords which are created from the Merge process. This table contain the meta-data about which records have changed from the Merge process.

The description below is based on the complete Insert Into .. Select .. Merge operations.
The results from the merge operation, which return multiple records, are inserted into the local table Trgt.

Overall Description:
1. Get desired data from source table. Lines 5-13
2. Merge results from step 1 with local table. Lines 4-26. In this process there are three conditions and one post processing.
3. Within the merge operation, the 3 conditions are: i) when there is a match between source and local table conditions; ii) when record is in source but not in local iii) when record is in local but not in source. Within each of these conditions, there are operations that perform a change to the local table.
4. At the end of the merge operation, as a post process, meta-data can be outputted as a temporary table (eg UpdateRecords table).
5. From the temporary table of the Merge operation, select the records which have been replaced. Lines 2-28.
6. From step 6, insert the results back into the local table Trgt. Line 1.


Detailed Description: From the innnermost loop ......
Lines 8-10: Downloads records from Source table, with relevant conditions.
Lines 5-13: The downloaded source data is aliased as src(C1,C2,C3), to be used as the source of the Merge operation
Line 14: Specify conditions for merge.
Lines 15-17: Columns C1, C3 are info to identify records between source and local tables. When identified, then if the C2 info is different between source and local tables, or either one is null, then update C2 info and Base_Run_I identifier.
Lines 18-21: When records in source but not in local, then put data from source to local.
Lines 22-24: When records in local but not in source, then mark the record as obsolete by setting Clatest = 0.
Line 4: This merging process does not merged with the local table, but rather it merges with the sub-table of the local table, where the entries are the most current ones, ie Clatest = 1.
Line 26: Outputs the merge process including special value $action which has values 'UPDATE', 'INSERT', etc. And also Deleted and Inserted values.
Lines 04-26: The complete merge process. At the end of this, both view and its real data in the local table Trgt, would have been changed. It also outputs meta data including inserted information and deleted information.

Eg. Deleted.BASE_RUN_I AS prevBaseRunId, Deleted.C1, Deleted.C2 AS prevCQC, Deleted.C3, 0 AS prevClatest, Inserted.Clatest AS newClatest

Lines 27: The results from the merge process, acting like a temporary table for the Select command.
Lines 02-28: Selection of META results from the merge process to extract information of updated records. Note in the actual merge process of lines 04-26, for the case when a row is being updated, the Trgt table and view will hold the updated record only, the previous record is not kept. Example

Before merge: {BASE_RUN_I, C1, C2, C3, Clatest} = {23, 1, 2, 3, 1} -> old
After merge: {BASE_RUN_I, C1, C2, C3, Clatest} = {24, 1, 5, 3, 1} -> new

There are no extra rows added to keep track of the old record. However at the end of the merge process, there are Meta-fields available which keep track of the old record as: Deleted.BASE_RUN_I, Deleted.C1, Deleted.C2, Deleted.C3, Deleted.Clatest. In this example:
{Deleted.BASE_RUN_I, Deleted.C1, Deleted.C2, Deleted.C3, Deleted.Clatest} = {23, 1, 2, 3, 1}

In Line 26, these Meta-fields are selective extracted into the temporary table called UpdateRecords. Also a new field called prevClatest is created with values 0 to indicate old results.

Finally, in line 2, the select statement chooses {prevBaseRunId, C1, prevCQC, C3, prevClatest}, in the example this is:
{23, 1, 2, 3, 0}

And ultimately in Line 1, the results from Line 2 which represent replaced records, are added back into the local table Trgt.

Lines 01-28: Add records to local table Trgt, those records selected by Lines 02-28 which are in fact old records which have been updated and use prevClatest into the Clatest column.


------------------------------------
01 INSERT INTO Trgt
02 SELECT prevBaseRunId, C1, prevCQC, C3, prevClatest
03 FROM (
04 MERGE VTemp AS trgt
05 USING (SELECT crr.C1, crr.C2, crr.C3
06 FROM Tsrc AS crr
07 INNER JOIN (
08 SELECT DISTINCT C1
09 FROM Tsrc
10 WHERE C1>2 AND C2 in ('A','A0')
11 ) as c
12 ON c.C1 = crr.C1
13 ) AS src (C1, C2, C3)
14 ON trgt.C1 = src.C1 AND trgt.C3 = src.C3
15 WHEN MATCHED AND ((trgt.C2 != src.C2) OR (ISNULL(trgt.C2,'') != ISNULL(src.C2,'')))
16 -- record exists for given client and rating date, but rating value has changed so update record
17 THEN UPDATE SET C2 = src.C2, BASE_RUN_I = @BASE_RUN_I
18 WHEN NOT MATCHED BY TARGET
19 -- records in source that are not in target, insert new records in target
20 THEN INSERT (BASE_RUN_I, C1, C2, C3, Clatest)
21 VALUES (@BASE_RUN_I, src.C1, src.C2, src.C3, 1)
22 WHEN NOT MATCHED BY SOURCE
23 -- records in target that are no longer in source, expire target records
24 THEN UPDATE SET Clatest = 0
25 -- record exists for given client and rating date, but rating value has changed so insert the previous record with old rating
26 OUTPUT $action, Deleted.BASE_RUN_I AS prevBaseRunId, Deleted.C1, Deleted.C2 AS prevCQC, Deleted.C3, 0 AS prevClatest, Inserted.Clatest AS newClatest)
27 AS UpdatedRecords (Action, prevBaseRunId, C1, prevCQC, C3, prevClatest, newClatest)
28 WHERE Action = 'UPDATE' AND newClatest = 1

No comments: