none
Hierarchy & Right Query using CTE

    Question

  • Hi there,

    I'm using SQL Server 2008 R2, I discovered this nice feature about CTE and Hierarchy. So here is my example

    CREATE TABLE MyServices
    (ServiceId INTEGER NOT NULL,
     MsgId INTEGER NOT NULL,
      EventId INTEGER NOT NULL,
     PRIMARY KEY (ServiceId, MsgId, EventId);

    INSERT INTO MyServices
    VALUES (19, 100, 1), (19, 101, 2), (19, 102, 3), (29, 102, 4), (39, 102, 5), (39, 112, 6)

    INSERT INTO MyServices
    VALUES (39, 122, 7), (39, 132, 8), (69, 132, 9), (69, 182, 10), (70, 192, 11)

    The EventId is an incremetal counter of the events

    Basically, below are my needs :

    • The parent "ServiceId" is the one that has the smallest EventId for a given combination
    • The ServiceId in the first lines is 19 (the parent for this combination), and the MsgId is different until the MsgId=102, in which case the ServiceId (29 and 39) would be the child of the ServiceId (19) since they share the same MsgId
    • Then the hierarchy continues until MsgId (132), in this case the ServiceId (69) would be the child of ServiceId (39), which in turns is the child of (19)
    • and so on...
    • Until the ServiceId(70), which is a new parent of a new combination since it is not related to the combination of ServiceId(19)

    Ideally I want to get the following result :

    ParentService_id, service_id, msg_id, event_id, depth
    19, 19, 100, 1, 0
    19, 19, 101, 2, 0
    19, 19, 102, 3, 0
    19, 29, 102, 4, 1
    19, 39, 102, 5, 1
    19, 39, 112, 6, 1
    19, 39, 122, 7, 1
    19, 39, 132, 8, 1
    19, 69, 132, 9, 2
    19, 69, 182, 10, 2
    70, 70, 192, 11, 0

    This of course has to be generic, I don't know when the id of a new parent starts

    I tried several combinations with no success, I'm not very confortable with the CTE syntax

    ;WITH MyCteService (ParentServiceId, ServiceId, MsgId, EventId, Lvl)
    AS
    (
        SELECT null as ParentServiceId, e.ServiceId, e.MsgId, e.EventId, 0 as Lvl
        FROM MyServices AS e
        where e.ServiceId = 19 --to be commented
        UNION ALL
        SELECT d.ServiceId as ParentServiceId, e.ServiceId, d.MsgId, e.EventId, (Lvl + 1) as Lvl
        FROM MyServices AS e
            INNER JOIN MyCteService AS d
                ON e.EventId > d.EventId and (e.MsgId = d.MsgId or e.ServiceId=d.ServiceId)
                    
    )

    SELECT * FROM MyCteService

    Any help on this?

    Thanks,

    Stefan


    Reporting Solution for BizTalk | follow-us on twitter

    Thursday, October 24, 2013 4:19 PM

Answers

  • There is no doubt that this is a bizarre model.

    To find the parents you could use the following query:

    SELECT
    	ServiceId,
    	MIN(EventId) AS min_EventId,
    	COUNT(*) AS cnt
    FROM
    	@MyServices
    GROUP BY
    	ServiceId
    
    INTERSECT
    
    SELECT
    	ServiceId,
    	MIN(EventId) AS min_EventId,
    	COUNT(*) AS cnt
    FROM
    	@MyServices AS A
    WHERE
    	NOT EXISTS (
    	SELECT
    		*
    	FROM
    		@MyServices AS B
    	WHERE
    		B.EventId < A.EventId
    		AND B.ServiceId <> A.ServiceId
    		AND B.MsgId = A.MsgId
    	)
    GROUP BY
    	ServiceId;
    GO

    As for assigning the ParentServiceID we couls use:

    SET NOCOUNT ON;
    USE tempdb;
    GO
    DECLARE @MyServices TABLE
        (
          ServiceId INTEGER NOT NULL ,
          MsgId INTEGER NOT NULL ,
          EventId INTEGER NOT NULL ,
          PRIMARY KEY ( ServiceId, MsgId, EventId )
        );
    
    INSERT  INTO @MyServices
    VALUES  ( 19, 100, 1 ),
            ( 19, 101, 2 ),
            ( 19, 102, 3 ),
            ( 29, 102, 4 ),
            ( 39, 102, 5 ),
            ( 39, 112, 6 )
    
    INSERT  INTO @MyServices
    VALUES  ( 39, 122, 7 ),
            ( 39, 132, 8 ),
            ( 69, 132, 9 ),
            ( 69, 182, 10 ),
            ( 70, 192, 11 );
    
    WITH C1 AS (
    SELECT
    	P.ServiceId AS ParentServiceID,
        P.ServiceId,
        P.MsgId,
        P.EventId
    FROM
    	@MyServices AS P
    	INNER JOIN
    	(
    	SELECT
    		ServiceId,
    		MIN(EventId) AS min_EventId,
    		COUNT(*) AS cnt
    	FROM
    		@MyServices
    	GROUP BY
    		ServiceId
    
    	INTERSECT
    
    	SELECT
    		ServiceId,
    		MIN(EventId) AS min_EventId,
    		COUNT(*) AS cnt
    	FROM
    		@MyServices AS A
    	WHERE
    		NOT EXISTS (
    		SELECT
    			*
    		FROM
    			@MyServices AS B
    		WHERE
    			B.EventId < A.EventId
    			AND B.ServiceId <> A.ServiceId
    			AND B.MsgId = A.MsgId
    		)
    	GROUP BY
    		ServiceId
    	) AS Q
    	ON P.ServiceId = Q.ServiceId
    	AND P.EventId = Q.min_EventId
    
    UNION ALL
    
    SELECT
    	P.ParentServiceID,
        C.ServiceId,
        C.MsgId,
        C.EventId
    FROM
    	C1 AS P
    	INNER JOIN
    	@MyServices AS C
    	ON (C.EventId = P.EventId + 1 AND C.ServiceId = P.ServiceId)
    	OR (C.EventId = P.EventId + 1 AND C.ServiceId <> P.ServiceId AND C.MsgId = P.MsgId)
    )
    SELECT
    	*
    FROM
    	C1
    ORDER BY
    	ParentServiceID,
    	EventId;
    GO

    I got a headache trying to understand the logic around the [depth] of the levels, so I am going to owe you that part for now.

    AMB

    Some guidelines for posting questions...

    Friday, October 25, 2013 2:34 PM

All replies

  • Makes no sense. "The parent "ServiceId" is the one that has the smallest EventId for a given combination" combination of what? Columns, values, gorns (scnr, tbbt on tv)?
    Thursday, October 24, 2013 4:48 PM
  • In addition to Stefan notes, sounds like there is a database design problem. It seems that you have two column ( ServiceId, MsgId ) in one table that present one attribute. 

    Saeid Hasani, sqldevelop.wordpress.com

    Download Books Online for SQL Server 2012

    Thursday, October 24, 2013 5:06 PM
  • Basically I want to identify each flow, I have only a combination of ServiceId and MsgIds and EventId to succeed. Flow is triggered with an event, this event might generate other events, which in turn generate other events and so on. The link between the lines stored of a flow is the MsgId and the EventId.

    My goal is to identify the Parent ServiceId, and all the children ServiceIds (with the corresponding MsgIds). I want to know what triggered this flow => identify the Parent

    I simplified the query, the real tables have many more columns that are irrelevant to determine the flow hierarchy, I don't have the ability to modify how the data is populated inside the tables, otherwise I would have done it, it is a proprietary database.

    If there is no link between the events and MsgId then it is a new flow...

    So my sentence : "The parent "ServiceId" is the one that has the smallest EventId for a given combination", Is right; to identify the parent we will use the smallest EventId" 

     

    In the result I don't mind having (a null value for the parents) :

    null, 19, 100, 1, 0
    null, 19, 101, 2, 0
    null, 19, 102, 3, 0
    19, 29, 102, 4, 1
    19, 39, 102, 5, 1
    19, 39, 112, 6, 1
    19, 39, 122, 7, 1
    19, 39, 132, 8, 1
    19, 69, 132, 9, 2
    19, 69, 182, 10, 2
    null, 70, 192, 11, 0

    Do you think this is achievable? what would be the best way ?


    Reporting Solution for BizTalk | follow-us on twitter



    Thursday, October 24, 2013 9:35 PM
  • So this is how we find the parent?

    DECLARE @MyServices TABLE
        (
          ServiceId INTEGER NOT NULL ,
          MsgId INTEGER NOT NULL ,
          EventId INTEGER NOT NULL ,
          PRIMARY KEY ( ServiceId, MsgId, EventId )
        );
    
    INSERT  INTO @MyServices
    VALUES  ( 19, 100, 1 ),
            ( 19, 101, 2 ),
            ( 19, 102, 3 ),
            ( 29, 102, 4 ),
            ( 39, 102, 5 ),
            ( 39, 112, 6 )
    
    INSERT  INTO @MyServices
    VALUES  ( 39, 122, 7 ),
            ( 39, 132, 8 ),
            ( 69, 132, 9 ),
            ( 69, 182, 10 ),
            ( 70, 192, 11 )
    
    
    -- 1. Find Parent
    SELECT TOP 1 *
    FROM @MyServices 
    ORDER BY EventId ASC;

    Friday, October 25, 2013 8:43 AM
  • Ahh, you are funy Stefan :-)

    The parents in my case are the lines with 19 and the line 70 (since the service id =70) is the start of the new flow, and hence it is the parent that your query wouldn't return

    Thx for the try ...


    Reporting Solution for BizTalk | follow-us on twitter


    Friday, October 25, 2013 9:50 AM
  • Come on, that's not funny at all.

    I asked: Combination of what? Got this answer:

    "The parent "ServiceId" is the one that has the smallest EventId for a given combination", Is right; to identify the parent we will use the smallest EventId" 

    "to identify the parent we will use the smallest EventId" I've used this in the query above. So, please, you need to be precise..

    DECLARE @MyServices TABLE
        (
          ServiceId INTEGER NOT NULL ,
          MsgId INTEGER NOT NULL ,
          EventId INTEGER NOT NULL ,
          PRIMARY KEY ( ServiceId, MsgId, EventId )
        );
    
    INSERT  INTO @MyServices
    VALUES  ( 19, 100, 1 ),
            ( 19, 101, 2 ),
            ( 19, 102, 3 ),
            ( 29, 102, 4 ),
            ( 39, 102, 5 ),
            ( 39, 112, 6 );
    
    INSERT  INTO @MyServices
    VALUES  ( 39, 122, 7 ),
            ( 39, 132, 8 ),
            ( 69, 132, 9 ),
            ( 69, 182, 10 ),
            ( 70, 192, 11 );
    
    
    -- 1. Find Parent
    WITH    Ordered
              AS ( SELECT   * ,
                            ROW_NUMBER() OVER ( PARTITION BY ServiceId ORDER BY EventId ) AS RN
                   FROM     @MyServices
                 )
        SELECT  *
        FROM    Ordered
        WHERE   RN = 1;

    So, what is step 2?
    Friday, October 25, 2013 10:20 AM
  • Ok, I'll try to explain with a graphic  please see picture below:

    So the EventId is not known in advance, it might start at 1, 11, or 2549.

    The combination: the same serviceid gives a set of differents msgids, if one of the message ids is referenced by another serviceids then it is part of the same flow..etc. as shown in the graph.

    The combination : 70, 192, 11 of the image is a new flow, since the msgid=192 is not linked to previous serviceid.

    I hope this is a better explanation


    Reporting Solution for BizTalk | follow-us on twitter




    Friday, October 25, 2013 12:31 PM
  • There is no doubt that this is a bizarre model.

    To find the parents you could use the following query:

    SELECT
    	ServiceId,
    	MIN(EventId) AS min_EventId,
    	COUNT(*) AS cnt
    FROM
    	@MyServices
    GROUP BY
    	ServiceId
    
    INTERSECT
    
    SELECT
    	ServiceId,
    	MIN(EventId) AS min_EventId,
    	COUNT(*) AS cnt
    FROM
    	@MyServices AS A
    WHERE
    	NOT EXISTS (
    	SELECT
    		*
    	FROM
    		@MyServices AS B
    	WHERE
    		B.EventId < A.EventId
    		AND B.ServiceId <> A.ServiceId
    		AND B.MsgId = A.MsgId
    	)
    GROUP BY
    	ServiceId;
    GO

    As for assigning the ParentServiceID we couls use:

    SET NOCOUNT ON;
    USE tempdb;
    GO
    DECLARE @MyServices TABLE
        (
          ServiceId INTEGER NOT NULL ,
          MsgId INTEGER NOT NULL ,
          EventId INTEGER NOT NULL ,
          PRIMARY KEY ( ServiceId, MsgId, EventId )
        );
    
    INSERT  INTO @MyServices
    VALUES  ( 19, 100, 1 ),
            ( 19, 101, 2 ),
            ( 19, 102, 3 ),
            ( 29, 102, 4 ),
            ( 39, 102, 5 ),
            ( 39, 112, 6 )
    
    INSERT  INTO @MyServices
    VALUES  ( 39, 122, 7 ),
            ( 39, 132, 8 ),
            ( 69, 132, 9 ),
            ( 69, 182, 10 ),
            ( 70, 192, 11 );
    
    WITH C1 AS (
    SELECT
    	P.ServiceId AS ParentServiceID,
        P.ServiceId,
        P.MsgId,
        P.EventId
    FROM
    	@MyServices AS P
    	INNER JOIN
    	(
    	SELECT
    		ServiceId,
    		MIN(EventId) AS min_EventId,
    		COUNT(*) AS cnt
    	FROM
    		@MyServices
    	GROUP BY
    		ServiceId
    
    	INTERSECT
    
    	SELECT
    		ServiceId,
    		MIN(EventId) AS min_EventId,
    		COUNT(*) AS cnt
    	FROM
    		@MyServices AS A
    	WHERE
    		NOT EXISTS (
    		SELECT
    			*
    		FROM
    			@MyServices AS B
    		WHERE
    			B.EventId < A.EventId
    			AND B.ServiceId <> A.ServiceId
    			AND B.MsgId = A.MsgId
    		)
    	GROUP BY
    		ServiceId
    	) AS Q
    	ON P.ServiceId = Q.ServiceId
    	AND P.EventId = Q.min_EventId
    
    UNION ALL
    
    SELECT
    	P.ParentServiceID,
        C.ServiceId,
        C.MsgId,
        C.EventId
    FROM
    	C1 AS P
    	INNER JOIN
    	@MyServices AS C
    	ON (C.EventId = P.EventId + 1 AND C.ServiceId = P.ServiceId)
    	OR (C.EventId = P.EventId + 1 AND C.ServiceId <> P.ServiceId AND C.MsgId = P.MsgId)
    )
    SELECT
    	*
    FROM
    	C1
    ORDER BY
    	ParentServiceID,
    	EventId;
    GO

    I got a headache trying to understand the logic around the [depth] of the levels, so I am going to owe you that part for now.

    AMB

    Some guidelines for posting questions...

    Friday, October 25, 2013 2:34 PM
  • Thanks a lot Hunchback,

    I understand the logic that you followed.

    The result is what I expected, but I'm not sure, the query would be performant in a real world scenario where there would be a lot of data. Is there any way to simplify the code? by adding for instance other columns, or generating intermediate tables?

    It is strange though that a table with 3 columns (and few lines in my example) generates that much of code :-)


    Reporting Solution for BizTalk | follow-us on twitter

    Friday, October 25, 2013 2:57 PM
  • >  Is there any way to simplify the code?

    Yes, change the model ;)

    You could use "Adjancecy List" or "Nested Sets". Joe Celko has written good articles around the "Nested Sets" model, and Itzik Ben-Gan touched the "Adjacency List" in his book about "TSQL Querying".



    AMB

    Some guidelines for posting questions...

    Friday, October 25, 2013 3:21 PM