Tuesday, March 4, 2014

Auto Audit Column Wise Change Data Capture (CDC)



Change data capture is used to audit insert, update, and delete activity for table in SQL Server table. You can go through below link to understand more theoretical knowledge.


http://technet.microsoft.com/en-us/library/cc645937.aspx


To understand change data capture practical, we will follow below steps.




1) Enable Change Data Capture


Query:


EXEC sys.sp_cdc_enable_db




Verify Query:

select is_cdc_enabled from sys.databases where name = db_name()




Output:


clip_image001


2) Create table dbo.Config.


Query:


Create Table dbo.Config(

ConfigID int IDENTITY(1,1) NOT NULL,

ConfigKey varchar(100) NOT NULL,

ConfigValue varchar(4000) NOT NULL,

InsertedBy bigint,

InsertedOn datetime2(7) NULL DEFAULT getdate(),

UpdatedBy bigint,

UpdatedOn datetime2(7) NULL,

constraint PK_Config Primary Key Clustered(ConfigID asc)

)

GO




3) Enable Change Data Capture for table.


Query:


EXEC sys.sp_cdc_enable_table

@source_schema = N'dbo',

@source_name = N'Config',

@role_name = NULL,

@supports_net_changes = 1

GO




Verify Query:

Select

is_tracked_by_cdc

From

sys.tables

Where

schema_name(schema_id) = 'dbo'

and name = 'Config'




Output:


clip_image002 


clip_image003


clip_image004


clip_image005


clip_image006


4) Perform DML operation on table.


Query:


-- Insert

insert into dbo.Config values('Key','Value',1,default,null,null)

insert into dbo.Config values('Key 1','Value 1',2,default,null,null)

insert into dbo.Config values('Key 2','Value 2',2,default,null,null)



-- Update

update dbo.Config

set

    ConfigValue = 'Value 12',

    UpdatedBy = 3,

    UpdatedOn = getdate()

where

    ConfigKey = 'Key 1'

update dbo.Config

set

    ConfigValue = 'Value 22',

    UpdatedBy = 3,

    UpdatedOn = getdate()

where

    ConfigKey = 'Key 2'





Verify Query:

Select * from dbo.Config






Output:


clip_image008


5) Get all changes from last day till today.


Query:


Declare

    @begin_time datetime,

    @end_time datetime,

    @begin_lsn binary(10),

    @end_lsn binary(10);



Set @begin_time = GETDATE()-1

Set @end_time = GETDATE()



Select @begin_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @begin_time);

Select @end_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);



Select

    ConfigId,

    ConfigKey,

    ConfigValue,

    InsertedBy,

    InsertedOn,

    UpdatedBy,

    UpdatedOn

From

    cdc.fn_cdc_get_all_changes_dbo_config(@begin_lsn, @end_lsn, 'all')




Output:


clip_image010


6) I have seen most of company or user wants see auditing on column wise instead of row wise.


Here, I tried to create procedure sc.sp_cdc_enable_table which accepts same parameter as dbo.sp_cdc_enable_table with some addition parameters as below to enable change data capture (CDC) for given table and also create function sc.fn_cdc_get_column_changes_<capture_instance> to see column wise changes.


Additional Parameters:



  1. @inserted_by_column_name

    1. Name of the Inserted by column.

    2. Default value is “InsertedBy”



  2. @updated_by_column_name

    1. Name of the Updated by column.

    2. Default value is “UpdatedBy”






To understand practically, I used same table dbo.Config as above and followed below steps.




a) Enable Change Data Capture with Column Wise Tracking. It will enable CDC on table if it’s not and also create function sc.fn_cdc_get_column_changes_<capture_instance>.


Query:


EXEC sc.sp_cdc_enable_table

@source_schema = N'dbo',

@source_name   = N'Config',

@role_name     = NULL,

@supports_net_changes = 1

GO




Verify newly created function:


clip_image011


b) Get all column wise changes from last day till today using above function.


Query:


Declare

    @from_lsn binary(10),

    @to_lsn binary(10);



Select @from_lsn = sys.fn_cdc_get_min_lsn ( 'dbo_Config' )

Select @to_lsn = sys.fn_cdc_get_max_lsn ()



select

    *

from

    [sc].[fn_cdc_get_column_changes_dbo_Config](@from_lsn,@to_lsn,null,null)

Order by

    __$start_lsn




Output:


clip_image013


c) Disable Change Data Capture with Column Wise Tracking.


Query:


EXEC sc.sp_cdc_disable_table

@source_schema = N'dbo',

@source_name   = N'Config',

@capture_instance = 'dbo_Config'

GO




Verify Query:

Select

    is_tracked_by_cdc

From

    sys.tables

Where

    schema_name(schema_id) = 'dbo'

    and name = 'Config'




Output:


clip_image014 


Limitation:



  1. We can’t use Order by clause inside function so we need to explicitly use order by clause for __$start_lsn to get data in proper sequence.



  2. We don’t have feature to search last 10 records updated from any table for given time.



  3. To avoid above limitation, I tried to create a process which will auto audit column wise changes for CDC table to dbo.Audit and dbo.AuditDetails for all CDC enable tables configured in dbo.AuditConfig.



  4. We will go through all above scenario one by one and their configuration. Before that we need to configure some objects. Below are those objects. (Configure Audit Process.sql)




Schemas:




  1. Sc





Tables:






  1. Sc. AuditConfig



    1. Columns



      1. ConfigID



        1. Auto increment ID for table.





      2. SchemaName



        1. Schema Name of table for which we want to enable column wise CDC.





      3. TableName



        1. Table Name of table for which we want to enable column wise CDC





      4. InsertedByColumnName



        1. Inserted by Column Name for table.





      5. UpdatedByColumnName



        1. Updated by Column Name for table.





      6. CapturedColumnList



        1. Identifies the source table columns that are to be included in the change table. captured_column_list is nvarchar(max) and can be NULL. If NULL, all columns are included in the change table.



        2. Column names must be valid columns in the source table. Columns defined in a primary key index, or columns defined in an index referenced by index_name must be included.



        3. Captured_column_list is a comma-separated list of column names. Individual column names within the list can be optionally quoted by using either double quotation marks ("") or square brackets ([]). If a column name contains an embedded comma, the column name must be quoted.



        4. captured_column_list cannot contain the following reserved column names: __$start_lsn, __$end_lsn, __$seqval, __$operation, __$update_mask, __UpdatedBy and __UpdatedOn





      7. LastLsn



        1. Last __$start_lsn value after inserting audit records to audit tables.





      8. IsActive



        1. 1 = Allow CDC records to insert into audit tables.



        2. 0 = Disallow CDC records to insert into audit tables.









  2. Sc.Audit



    1. Coumns



      1. AuditID



        1. Auto increment ID for table.





      2. SchemaName



        1. Schema Name of table for which we have audit.





      3. TableName



        1. Table Name of table for which we have audit.





      4. __$start_lsn



        1. Reference lsn values for CDC table.





      5. PrimaryKeyName



        1. Primary Key Name for which we have audit.





      6. PrimaryKeyValue



        1. Primary Key Value for which we have audit.





      7. UpdatedBy



        1. InsertedBy value from table for newly inserted records



        2. UpdatedBy value for table for deleted record.



        3. Null for deleted records from table.









  3. Sc.AuditDetail



    1. Columns



      1. AuditDetailID



        1. Auto increment ID for table.





      2. AuditID



        1. Reference column from Sc.Audit table.





      3. ColumnName



        1. Column for which changes happened on table.





      4. OldColumnValue



        1. NULL for newly inserted records



        2. Old value before changes for updated and deleted records.





      5. NewColumnValue



        1. New value for inserted and updated records.



        2. NULL for deleted records.











Triggers:




  1. Sc.TRG_AuditConfig : This trigger is used to execute below steps based on operation in dbo.AuditConfig table.



    1. Insert



      1. Enable CDC to track column wise changes using sc.sp_cdc_enable_table.





    2. Delete



      1. Disable CDC to track column wise changes using sc.sp_cdc_disable_table.





    3. Update



      1. It will restrict users from updating Schema or Table Name.









Procedures:




  1. Sc.sp_cdc_enable_table: This procedure is used to enable CDC with column wise change. We have already seen demo for that in step 6 c



  2. Sc.sp_cdc_disable_table This procedure is used to disable CDC with column wise change. We have already seen demo for that in step 6 c



  3. Sc.cp_Audit: This procedure is used to audit records from CDC table using function sc.fn_cdc_get_column_changes_<capture_instance>.





Jobs:



  1. DatabaseName.dbo.job_audit



    1. Name : DatabaseName.dbo.job_audit



    2. Description: This job is used to audit records for tables which are configured in sc.AuditConfig table.



    3. Steps



      1. Name: Step 1



      2. Type: Transact-SQL script(T-SQL)



      3. Command: exec sc.cp_Audit





    4. Schedules



      1. Name: Occurs every day every 1 minute(s).



      2. Schedule type: Recurring



      3. Occurs: Daily



      4. Occurs every : 1 minute(s)



      5. Start Date: Today’s Date







  2. DatabaseName.dbo.job_error_log_alert (Check blog http://www.sqlscientist.com/2013/07/error-logging-and-alerts.html)




Example:


To understand this scenario, I have taken Employee and Department example. Below is E-R Diagram for that. Below are the details of tables. (Configure Audit Process Employee & Department tables.sql)



  1. Department



    1. There may be many departments in database.





  2. Employee



    1. There are many employees for each departments






clip_image016 


Now, our requirement is to audit column wise changes for Department and Employee table. For that, we need to insert these tables in sc.AuditConfig table. On insert on these tables into sc.AuditConfig table, it will automatically enable CDC on this table and create function sc.fn_cdc_get_column_changes_<capture_instance> using trigger Sc.TRG_AuditConfig.


Query:


insert into sc.AuditConfig values('dbo','Department','InsertedBy','UpdatedBy','ID,Name,InsertedBy,UpdatedBy',null,1,getdate(),null)



insert into sc.AuditConfig values('dbo','Employee','InsertedBy','UpdatedBy','ID,DepartmentID,Name,InsertedBy,UpdatedBy',null,1,getdate(),null)





Output:


clip_image017 


Verify:


clip_image019


clip_image020 


Now, we will talk about our daily changes on tables. We can have 3 types of changes in our tables. We will talk about each one by one.


1. Insert records.


Query:


-- Insert two records for department

insert into department(id,name,insertedby,insertedon)

values(1,'department 1', 1, getdate())



insert into department(id,name,insertedby,insertedon)

values(2,'department 2', 1, getdate())



-- Insert two records for employee

insert into employee(id,departmentid,name,insertedby,insertedon)

values (1,1,'Employee 1',1,getdate())



insert into employee(id,departmentid,name,insertedby,insertedon)

values (2,1,'Employee 1',1,getdate())





Verify Audit Query:


Note: CDC is asynchronously update CDC table. On top of that, we have job “job_Audit” to audit records to our tables. You can find some delay to get audit records.


Select 
A.SchemaName,
A.TableName,
A.PrimaryKeyName,
A.PrimaryKeyValue,
AD.ColumnName,
AD.OldColumnValue,
AD.NewColumnValue,
A.UpdatedBy,
A.UpdatedOn
From
sc.Audit A
inner join sc.AuditDetail AD on AD.AuditID = A.AuditID
Order By
A.__$start_lsn asc





Verify Audit Output:




clip_image022




2. Update records




Query:




-- Update value for department name

update department set name = name + '1', updatedby = 2, updatedon = getdate()

-- Update value for employee name

update employee set name = name + '1', updatedby = 2, updatedon = getdate()




Verify Audit Query:




You can use same query as above




Verify Audit Output:




clip_image024




3. Delete records




Query:




-- Delete records from Employee

Delete from Employee

-- Delete records from Department

Delete from department








Verify Audit Query:




You can use same query as above




Verify Audit Output:




clip_image026




Now, we had requirement to just inactive audit column wise changes to audit tables. We can do that using below query.


Query:

Update sc.AuditConfig set IsActive = 0




Verify:




clip_image028It will not disable CDC on table but it will skip this table to audit records in audit tables. Once we will active these tables again. It will audit all records from LastLsn till today into Audit table.







Now, we want to disable CDC permanently for these tables. For that, we need to delete those records from sc.AuditConfig table which will disable CDC using trigger Sc.TRG_AuditConfig.




Query:




Delete from sc.AuditConfig Output:




clip_image029 




Below are the scripts to download.
























13 comments:

  1. Hi Asif, Thank you so much for posting your solution. Could I ask a couple of questions please:
    1. Do you have to add the columns InsertBy and UpdatedBy to the source tables?
    2. Do you have to have primary key columns set up on the source tables? Is there any work around if there are no primary keys?

    ReplyDelete
    Replies
    1. Hi Aarion,

      Below are answers for your questions.
      1. As per current decision, we should have InsertedBy and UpdatedBy columns. If suppose, you have only one column ModifiedBy for your table then you can set the value ModifiedBy for both InsertedByColumnName and UpdatedByColumnName

      2. Yes, In most of database design, we will be having primary key and for that only we mostly looks for column wise changes. Assuming that, we have designed this code to use primary key from table to see column wise changes.

      Please feel free if you have more questions.

      Delete
  2. First of all again thank you for doing this solution. I am learning a lot from just implementing the code

    Could I ask if there is a way in which we could we could create a IDENTITY(1,1) column and the code pick this column as the unique column without having the primary key?

    I will be loading a very large table and loading with the primary key may slow down the process.

    Thank you

    ReplyDelete
    Replies
    1. Hi Aarion,

      First of all you need to understand that we have created wrapper on CDC which is inbuilt feature of SQL Server. But for our implementation, we are supporting column level auditing for primary key only right now.

      It's required to create unique index of primary key asp per below comments in TechNet for CDC.

      [ @index_name = ] 'index_name'
      The name of a unique index to use to uniquely identify rows in the source table. index_name is sysname and can be NULL. If specified, index_name must be a valid unique index on the source table. If index_name is specified, the identified index columns takes precedence over any defined primary key columns as the unique row identifier for the table.

      http://technet.microsoft.com/en-us/library/bb522475.aspx

      If you user unique key. It will create no clustered index and primary key is using clustered index.

      Performance wise both will be same only while inserting data.

      There are different ways you can do it if you want to load large data for you table and don't want to create index.

      If you want to load very large data initially only then you can check difference manually and load into audit table. then enable CDC on it.

      Please let me know if you looking for specific scenario.

      Delete
  3. Asif, also may I ask if the table has a composite primary key, would I have to amend the code in anyway?

    ReplyDelete
    Replies
    1. Sorry Aarion, In first version, we have supported only for single column as primary key.

      Yes, we can change code to use composite primary key.

      Please let me know if you need of it so I will try to create 2nd version of it which will support composite primary key.

      Delete
  4. Hi,

    Is this functionality available on Enterprise 2008R2 Standard..
    Thanks

    ReplyDelete
    Replies
    1. Yes, CDC is already supported in SQL Server 2008 R2.

      Delete
  5. Hi,

    Apologies my question was not as clear as it should have been.

    Is CDC available on 2008R2 Standard, I have read in some blogs that itis not available.

    Thanks in advance.

    ReplyDelete
    Replies
    1. Hi,

      I think you may be referring wrong article. Please check below link from TechNet. CDC is supported from 2008 version.

      http://technet.microsoft.com/en-us/library/bb500353(v=sql.105).aspx

      Delete
  6. Hi Asif , Can you please put more light on the given bellow function.
    EXEC sc.sp_cdc_enable_table
    EXEC sc.fn_cdc_get_column_changes_

    I mean to say can you please share the Functions what you have written.

    ReplyDelete
  7. Hi Upen,

    I have created new procedure "sc.sp_cdc_enable_table" on top of the existing procedure. This procedure is calling the original procedure to enable CDC and some logic to create dynamic function to get column wise changes.

    sc.fn_cdc_get_column_changes_ is used to get the column wise changes which is supported by Microsoft. It is dynamically created for each table from above procedure.

    ReplyDelete
  8. Pretty! This has been an incredibly wonderful post.
    Thanks for providing this info.

    ReplyDelete