none
SSIS VB Script Issue RRS feed

  • Question

  • I have an SSIS package where I am importing records from a gate access database into another database.  Both are SQL Server databases.  I use data flow components to import the records and a VB Script component to perform other calculations on the way through.

    My issue is I need to identify records that I want to flag as duplicates.  e.g. A User might swipe their Access Card multiple times when trying to open a gate which generates multiple records in the gate access database, but because the time between each swipe is very small I want to only count the first record and flag the remaining records for the same User within the short time period as duplicates. So, as I process a record I need to check whether it is the same Access Card as the previous record I have processed, and if so, calculate the time between the 2 record date/time fields.

    I have tried using Variables to store the Card No and Date/Time of the record being processed but there are limitations on when the Variables can be updated - only in the PostExecute which I believe is after ALL records have been processed.

    Does anyone have a simple way of achieving this?


    Regards Kevin Seerup
    GoalMaker Software Solutions Pty Ltd
    www.goalmaker.com.au

    Thursday, September 12, 2019 3:11 AM

Answers

  • Hi Kevin,

    You can try to use LAG() and DATEDIFF() functions in the source query to identify time deltas for the same CardNo. This way you won't need a Script Component in the SSIS Data Flow.

    Check it out:

    -- DDL and sample data population, start
    DECLARE @mockTbl TABLE (
    	ID INT IDENTITY(1,1) PRIMARY KEY
    	, [CardNo] VARCHAR(30)
    	, [DateTimeProcessed] DATETIME
    );
    INSERT INTO @mockTbl (CardNo, DateTimeProcessed)
    VALUES ('100', '2019-09-12 09:00:00')
    	,('152', '2019-09-12 09:10:00')
    	,('152', '2019-09-12 09:10:10')
    	,('160', '2019-09-12 09:12:10');
    -- DDL and sample data population, end
    
    ;WITH rs AS
    (
    SELECT * 
    	, LAG(DateTimeProcessed, 1) OVER (PARTITION BY CardNo ORDER BY DateTimeProcessed) AS LagValue
    FROM @mockTbl
    )
    SELECT *
    	, IIF(DATEDIFF(second, LagValue, DateTimeProcessed) <= 10, 1, 0) AS [duplicate]
    FROM rs
    ORDER BY id, DateTimeProcessed;

    Output:
    ID	CardNo	DateTimeProcessed	  LagValue	duplicate
    1	100	2019-09-12 09:00:00.000	  NULL	          0
    2	152	2019-09-12 09:10:00.000	  NULL	          0
    3	152	2019-09-12 09:10:10.000	  2019-09-12 09:10:00.000	  1
    4	160	2019-09-12 09:12:10.000	  NULL	          0
    • Edited by Yitzhak Khabinsky Thursday, September 12, 2019 5:05 AM
    • Marked as answer by GoalMaker Thursday, September 12, 2019 5:24 AM
    Thursday, September 12, 2019 4:50 AM

All replies

  • Hi Kevin,

    Please share your DDL and sample data population.

    And use it to explain the logic.

    Thursday, September 12, 2019 3:48 AM
  • It isn't a complex issue.  Records might look like the following:


    As each record is read the Card No and Time are stored. 

    1. Read Record 1 - first record, no duplicate.
    2. Read Record 2 - different Card No to Record 1 - no duplicate
    3. Read Record 3 - same Card No as Record 2, time is within 10 seconds of Record 2 - flag as duplicate
    4. Read Record 4 - different Card No to Record 3 - no duplicate


    Regards Kevin Seerup
    GoalMaker Software Solutions Pty Ltd
    www.goalmaker.com.au


    • Edited by GoalMaker Thursday, September 12, 2019 4:15 AM
    Thursday, September 12, 2019 4:14 AM
  • Hi Kevin,

    One more question: what is your SQL Server version?

    Thursday, September 12, 2019 4:23 AM
  • SQL Server 2014

    Regards Kevin Seerup
    GoalMaker Software Solutions Pty Ltd
    www.goalmaker.com.au

    Thursday, September 12, 2019 4:28 AM
  • Hi Kevin,

    You can try to use LAG() and DATEDIFF() functions in the source query to identify time deltas for the same CardNo. This way you won't need a Script Component in the SSIS Data Flow.

    Check it out:

    -- DDL and sample data population, start
    DECLARE @mockTbl TABLE (
    	ID INT IDENTITY(1,1) PRIMARY KEY
    	, [CardNo] VARCHAR(30)
    	, [DateTimeProcessed] DATETIME
    );
    INSERT INTO @mockTbl (CardNo, DateTimeProcessed)
    VALUES ('100', '2019-09-12 09:00:00')
    	,('152', '2019-09-12 09:10:00')
    	,('152', '2019-09-12 09:10:10')
    	,('160', '2019-09-12 09:12:10');
    -- DDL and sample data population, end
    
    ;WITH rs AS
    (
    SELECT * 
    	, LAG(DateTimeProcessed, 1) OVER (PARTITION BY CardNo ORDER BY DateTimeProcessed) AS LagValue
    FROM @mockTbl
    )
    SELECT *
    	, IIF(DATEDIFF(second, LagValue, DateTimeProcessed) <= 10, 1, 0) AS [duplicate]
    FROM rs
    ORDER BY id, DateTimeProcessed;

    Output:
    ID	CardNo	DateTimeProcessed	  LagValue	duplicate
    1	100	2019-09-12 09:00:00.000	  NULL	          0
    2	152	2019-09-12 09:10:00.000	  NULL	          0
    3	152	2019-09-12 09:10:10.000	  2019-09-12 09:10:00.000	  1
    4	160	2019-09-12 09:12:10.000	  NULL	          0
    • Edited by Yitzhak Khabinsky Thursday, September 12, 2019 5:05 AM
    • Marked as answer by GoalMaker Thursday, September 12, 2019 5:24 AM
    Thursday, September 12, 2019 4:50 AM
  • Thank you very much for your help!

    Regards Kevin Seerup
    GoalMaker Software Solutions Pty Ltd
    www.goalmaker.com.au

    Thursday, September 12, 2019 5:25 AM