none
sql query to find exact matches

    질문

  • i have a table A as follows

    Create Table A(

    Id int,

    X int,

    Y int

    Primary key (Id)

    )

    table contains

    Id    X      Y

    1      2       3

    2      2       4

    3      3       3

    4      3       4

    5       4      4

    6       5       5

    now from the data x values 2 and 3 contains exact matches of Y  so count is 2

    x values 2 and 3 and 4 contains atlest one match so count is 3 and

    x value 5 doesnt contain any matches so count 0

    i need this from sql query how to do this 

    2018년 7월 10일 화요일 오전 4:34

답변

  • I am a little confused over what you want, but i came up with the following anyway. Note, it uses STRING_AGG() which is only available in SQL Server 2o17 and above.

    WITH
    	A(Id, X, Y)
    AS
    	(
    	 SELECT 1, 2, 3 UNION ALL
    	 SELECT 2, 2, 4 UNION ALL
    	 SELECT 3, 3, 3 UNION ALL
    	 SELECT 4, 3, 4 UNION ALL
    	 SELECT 5, 4, 4 UNION ALL
    	 SELECT 6, 5, 5
    	),
    	Aggregate(X, Y)
    AS
    	(
    	 SELECT
    		X,
    		STRING_AGG(Y, ' ') WITHIN GROUP(ORDER BY Y)
    	 FROM
    		A
    	 GROUP BY
    		X
    	),
    	Match_All(Y, Y_Count)
    AS
    	(
    	 SELECT
    		Y,
    		COUNT(Y)
    	 FROM
    		Aggregate
    	 GROUP BY
    		Y
    	 HAVING
    		COUNT(Y) > 1
    	),
    	Match_Some
    AS
    	(
    	 SELECT
    		X
    	 FROM
    		A A1
    	 WHERE
    		--Some match
    		EXISTS
    		(
    		 SELECT
    			*
    		 FROM
    			A A2
    		 WHERE
    			A2.X <> A1.X
    		   AND	A2.Y = A1.Y
    		)
    		--But not all match
    	   AND	NOT EXISTS
    		(
    		 SELECT
    			*
    		 FROM
    			Match_All,
    			Aggregate
    		 WHERE
    			Aggregate.X = A1.X
    		   AND	Aggregate.Y = Match_All.Y
    		)
    	),
    	Match_None
    AS
    	(
    	 SELECT
    		X,
    		Y
    	 FROM
    		A A1
    	 WHERE
    		NOT EXISTS
    		(
    		 SELECT
    			*
    		 FROM
    			A A2
    		 WHERE
    			A2.X <> A1.X
    		   AND	A2.Y = A1.Y
    		)
    	)
    SELECT
    	(SELECT SUM(Y_Count) FROM Match_All)		CountOfExactMatch,
    	(SELECT SUM(X) FROM Match_Some)			CountofOverlappingMatch,
    	(SELECT COUNT(DISTINCT X) FROM Match_None)	CountNoMatches
    GO

    2018년 7월 10일 화요일 오후 12:36
    중재자

모든 응답

  • Your description of the expected output seems not clear. Can you post the expected output in table format?
    • 편집됨 SQLNeophyte 2018년 7월 10일 화요일 오전 5:14
    2018년 7월 10일 화요일 오전 5:13
  • Understood your description but didnt understand how you want result to be displayed

    Please post how the sample output would be in your case


    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    Visakh
    ----------------------------
    My Wiki User Page
    My MSDN Page
    My Personal Blog
    My Facebook Page

    2018년 7월 10일 화요일 오전 5:23
  • Sample output

    CountOfExactMatch   CountofOverlappingMatch  CountNoMatches

    2 (X values 2 and 3)   3 (X values 2,3,4)                  1 (X value of 5)

    2018년 7월 10일 화요일 오전 6:22
  • Sample output

    CountOfExactMatch   CountofOverlappingMatch  CountNoMatches

    2 (X values 2 and 3)   3 (X values 2,3,4)                  1 (X value of 5)

    Sorry didnt understand

    How can you have a generic single line result like this

    there are multiple overlap list  like 3,4 4 alone, 5 alone etc

    so on what basis you present an output like this?

    The above output includes only the stats for the list 3,4

    if you take 5 as an example output will be

    CountOfExactMatch   CountofOverlappingMatch  CountNoMatches

    1 (X value 5)                 0                                           3 (X values of 2,3 and 4)

    so how would you determine which overlap list of Y is to be considered? is it a user input?


    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    Visakh
    ----------------------------
    My Wiki User Page
    My MSDN Page
    My Personal Blog
    My Facebook Page

    2018년 7월 10일 화요일 오전 7:02
  • Sample output
    
    
    CountOfExactMatch   CountofOverlappingMatch  CountNoMatches
    
    2 (X values 2 and 3)   3 (X values 2,3,4)                  1 (X value of 5)
    

    sorry couldn't understand your requirement. But, are you looking for the below:

    select count(Id) from Table_A where X=Y -- X and Y value are same

    select count(Id) from Table_A where X!=Y -- X & Y not same

    select count(Id) from Table_A where X in(select Y from Table_A) -- x value matching any one of the Y value

    select count(Id) from Table_A where X not in(select Y from Table_A) -- x value not matching any of the Y value

    2018년 7월 10일 화요일 오전 8:17
  • the sample output 1st column should have total count of id having exact matches of column y values 

    and 2nd column should have total count of id having at least one match  of y values

    and 3rd column having id count without any match of y value with any other ids

    2018년 7월 10일 화요일 오전 9:21
  • the sample output 1st column should have total count of id having exact matches of column y values 

    and 2nd column should have total count of id having at least one match  of y values

    and 3rd column having id count without any match of y value with any other ids

    nope 

    you still didnt get my point

    You're not considering full y values here (5 is left out)

    Thats why I asked on what basis you select the required y values. Is it based on user input?

    Otherwise there can be multiple outputs

    if i take just 5 as an example as I illustrated previously below is a valid output too

    CountOfExactMatch   CountofOverlappingMatch  CountNoMatches

    1 (X value 5)                 0                                           3 (X values of 2,3 and 4)

    So you need to either consider full Y values (including 5 in your example) as the list or have a criteria to determine what values from which group you need to consider (for example 3,4 was the group you considered in your posted example)

    If you include 5 to the group (3,4,5) the output will change to below as per your rule

    CountOfExactMatch   CountofOverlappingMatch  CountNoMatches

    0                                  4 (X values 2,3,4)                  0 (X value of 5)


    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    Visakh
    ----------------------------
    My Wiki User Page
    My MSDN Page
    My Personal Blog
    My Facebook Page

    2018년 7월 10일 화요일 오전 9:31
  • yes 

    2018년 7월 10일 화요일 오전 11:22
  • @Jackuline yes

    2018년 7월 10일 화요일 오전 11:44
  • I am a little confused over what you want, but i came up with the following anyway. Note, it uses STRING_AGG() which is only available in SQL Server 2o17 and above.

    WITH
    	A(Id, X, Y)
    AS
    	(
    	 SELECT 1, 2, 3 UNION ALL
    	 SELECT 2, 2, 4 UNION ALL
    	 SELECT 3, 3, 3 UNION ALL
    	 SELECT 4, 3, 4 UNION ALL
    	 SELECT 5, 4, 4 UNION ALL
    	 SELECT 6, 5, 5
    	),
    	Aggregate(X, Y)
    AS
    	(
    	 SELECT
    		X,
    		STRING_AGG(Y, ' ') WITHIN GROUP(ORDER BY Y)
    	 FROM
    		A
    	 GROUP BY
    		X
    	),
    	Match_All(Y, Y_Count)
    AS
    	(
    	 SELECT
    		Y,
    		COUNT(Y)
    	 FROM
    		Aggregate
    	 GROUP BY
    		Y
    	 HAVING
    		COUNT(Y) > 1
    	),
    	Match_Some
    AS
    	(
    	 SELECT
    		X
    	 FROM
    		A A1
    	 WHERE
    		--Some match
    		EXISTS
    		(
    		 SELECT
    			*
    		 FROM
    			A A2
    		 WHERE
    			A2.X <> A1.X
    		   AND	A2.Y = A1.Y
    		)
    		--But not all match
    	   AND	NOT EXISTS
    		(
    		 SELECT
    			*
    		 FROM
    			Match_All,
    			Aggregate
    		 WHERE
    			Aggregate.X = A1.X
    		   AND	Aggregate.Y = Match_All.Y
    		)
    	),
    	Match_None
    AS
    	(
    	 SELECT
    		X,
    		Y
    	 FROM
    		A A1
    	 WHERE
    		NOT EXISTS
    		(
    		 SELECT
    			*
    		 FROM
    			A A2
    		 WHERE
    			A2.X <> A1.X
    		   AND	A2.Y = A1.Y
    		)
    	)
    SELECT
    	(SELECT SUM(Y_Count) FROM Match_All)		CountOfExactMatch,
    	(SELECT SUM(X) FROM Match_Some)			CountofOverlappingMatch,
    	(SELECT COUNT(DISTINCT X) FROM Match_None)	CountNoMatches
    GO

    2018년 7월 10일 화요일 오후 12:36
    중재자
  • am getting this error when i used query with my acual tables

    'STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation.'

    2018년 7월 10일 화요일 오후 1:12
  • and also as  i see from query exact match doesnt means exact count match i mean exact values match

    example if x value 2 has --> 3, 4

    similarly x value 3 also has --->3,4 

    so x values comes in exact match result as 2 not because both y values count of 2 but both y values for 2 and 3 are exactly same

    2018년 7월 10일 화요일 오후 1:44
  • It is difficult to provide a solution without a complete range of data.

    >'STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation.'

    That's a lot of integers. Anyway, just cast the field to VARCHAR(MAX) to work. So "STRING_AGG(Y, ' ') WITHIN GROUP(ORDER BY Y)" would become "STRING_AGG(CAST(Y AS VARCHAR(MAX)), ' ') WITHIN GROUP(ORDER BY Y)".

    > i see from query exact match doesnt means exact count match i mean exact values match

    Can you give an example with data?

    If i wrote and understood it correctly, Aggregate will create aggregates for each value of X. Then, Match_All will return the amount of times each aggregate is repeated, as long as it is more than once. For example, if X and Y exactly match, and A, B, and C also exactly match, this will return two records, with a 2 and a 3 to match the values.

    After that, the main query will SUM() the numbers together, in my example totalling 5, because of the 2 from the X/Y match, and 3 from the A/B/ C match.

    If this is not what you want, please supply some more data that will show the problem and what the expected result is.

    2018년 7월 10일 화요일 오후 2:30
    중재자
  • Id    X      Y

    1      2       3

    2      2       4

    3      3       3

    4      3       4

    5       4      4

    6       5       5

    let me give more clarity 

    so here i need count of all x which contains exact same values of y

    example --> x value 2 contains y values 3& 4 

    similarly x value 3 also contains y value 3 & 4 so the count of exact match for this data will be 2 as there are no other x value having y values exactly 3 & 4

    similarly x values of 2 ,3,4  contains at least one matching y value that is 4 ( i mean to say x values 2,3,4 all contains y value 4 which is at least one match ) so overlappingcountMatch is 3 here (i.e values 2,3,4) 

    but for x value 5 which does't contain any other x value having y value of 5 so it is non overlapping count of 1

    2018년 7월 11일 수요일 오전 4:58
  • To help me understand, please show what the results were and what they should have been.

    Also, what if the following was the dataset:

    Id x y
    1 1 11
    2 1 12
    3 2 11
    4 2 12
    5 3 13
    6 3 14
    7 4 13
    8 3 14

    In this case, x 1 and 2 match (they are both 11, 12) and also x 3 and 4 match (both being 13, 14).

    2018년 7월 11일 수요일 오전 10:28
    중재자
  • I assume id 8 value for x is 4 if  --->   x 3 and 4 match (both being 13, 14). is the case

    now exactmatchcount is 4 

    Overlapping match count is 4 

    non overlapmatchcount is 0

    2018년 7월 11일 수요일 오후 1:09
  • >I assume id 8 value for x is 4 if  --->   x 3 and 4 match (both being 13, 14). is the case

    Yes. Oops.

    >Overlapping match count is 4

    Oh, not exclusive. That's easy to fix. Just remove the AND NOT EXISTS in Match_Some.

    One more question, what would be the result for the following dataset:

    QuestionId x y

    1 1 11
    2 1 12
    3 2 11
    4 2 12
    5 3 11
    6 3 12


    2018년 7월 11일 수요일 오후 4:08
    중재자
  • >> I have a table A as follows <<

    This is not a valid table. y_coordou have magical generic "nothing in particular_id" instead of a real key_coord. This cannot be an INTEGER NOT NULLeger by_coord definition, since it has to be part of a nominal scale. Let's  clean it up a little bit and add an insertion statement.

    CREATE TABLE Locations
     (location_id CHAR(2) NOT NULL PRIMARY KEY, 
     x_coord INTEGER NOT NULL, 
     y_coord INTEGER NOT NULL);
     
    INSERT INTO Locations
    VALUES
    ('001', 2, 3), 
    ('002', 2, 4), 
    ('003', 3, 3), 
    ('004', 3, 4), 
    ('005', 4, 4), 
    ('006', 5, 5);

    Google "relational division" and look at the various solutions will find there. 

    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    2018년 7월 11일 수요일 오후 5:06
  • Exactmatchcount -->3

    OverlappingMAtchcount -->3

    NonOverLappingmatchcount-->0

    2018년 7월 13일 금요일 오전 4:44
  • I did not test all the cases, but i rewrote it with the original test case in mind. Here are the fixes:

    1. STRING_AGG() uses CAST to avoid the limit.
    2. In both Aggregate and Match_All, removed X from the SELECT clause.
    3. Match_Some and Match_None do COUNT(DISTINCT)
    4. Removed the aggregates for Match_Some and Match_None from the main query part.
    5. Removed the NOT EXISTS from Match_Some.
    6. Added comments.

    The main issue was Match_Some was missing a DISTINCT. It became obvious as i reworked and commented on the query.

    WITH
    	A(Id, X, Y)
    AS
    	(
    	 SELECT 1, 2, 3 UNION ALL
    	 SELECT 2, 2, 4 UNION ALL
    	 SELECT 3, 3, 3 UNION ALL
    	 SELECT 4, 3, 4 UNION ALL
    	 SELECT 5, 4, 4 UNION ALL
    	 SELECT 6, 5, 5
    	),
    	Aggregate(Y)
    AS
    	(
    	 -- Preparatory step for Match_All.
    	 -- Aggregate Y as a string to matchup with other sets of X.
    	 -- Order is required to guarantee two string with the same Ys match.
    	 SELECT
    		STRING_AGG
    		(
    		  CAST(Y AS VARCHAR(MAX)),
    		  ' '
    		)
    		WITHIN GROUP(ORDER BY Y)
    	 FROM
    		A
    	 GROUP BY
    		X
    	),
    	Match_All(Y)
    AS
    	(
    	 -- Total matches of each X, excluding those with no matches.
    	 -- This returns one record per X, so a true total must SUM() this.
    	 SELECT
    		COUNT(Y)
    	 FROM
    		Aggregate
    	 GROUP BY
    		Y
    	 HAVING
    		COUNT(Y) > 1
    	),
    	Match_Some(X)
    AS
    	(
     	 -- Count each X that has at least one matching Ys in at least one other X.
    	 SELECT
    		COUNT(DISTINCT X)
    	 FROM
    		A A1
    	 WHERE
    		-- Look for different X, same Y.
    		EXISTS
    		(
    		 SELECT
    			*
    		 FROM
    			A A2
    		 WHERE
    			A2.X <> A1.X
    		   AND	A2.Y = A1.Y
    		)
    	),
    	Match_None(X)
    AS
    	(
    	 -- Count each X that has no matching Ys in any other X.
    	 SELECT
    		COUNT(DISTINCT X)
    	 FROM
    		A A1
    	 WHERE
    		-- Look for different X, same Y.
    		NOT EXISTS
    		(
    		 SELECT
    			*
    		 FROM
    			A A2
    		 WHERE
    			A2.X <> A1.X
    		   AND	A2.Y = A1.Y
    		)
    	)
    SELECT
    	-- SUM() Match_All, as it returns one record per X.
    	(SELECT SUM(Y) FROM Match_All)	CountOfExactMatch,
    	(SELECT X FROM Match_Some)	CountofOverlappingMatch,
    	(SELECT X FROM Match_None)	CountNoMatches
    GO

    2018년 7월 13일 금요일 오전 11:54
    중재자