none
Data Concatenation

    Question

  • I've provided a sample table and some sample data below. The plan is to return Person_Id, and the favorite color(s) for each person. The below query works. However, I've recently been told that mixing XML and SQL together results in "poor performance, no optimization, and unmaintainable code".

    So what is the correct way to write the below statement?

    CREATE TABLE #Example_Table ( Person_Id INT ,
    							  Favorite_Color VARCHAR(20) )
    INSERT	INTO #Example_Table
            ( Person_Id, Favorite_Color )
    VALUES  ( 1, 'Red'  ) ,
    		( 1, 'Blue' ) ,
    		( 2, 'Green' ) ,
    		( 3, 'Blue' ) ,
    		( 3, 'Green' )
    
    SELECT  Person_Id ,
            SUBSTRING(( SELECT  ( '; ' + Favorite_Color )
                        FROM    #Example_Table AS et2
                        WHERE   et2.Person_Id = et.Person_Id
                      FOR
                        XML PATH('') ), 3, 100)
    FROM    #Example_Table AS et
    GROUP BY Person_Id
    
    DROP TABLE #Example_Table


    Lefka

    Thursday, June 27, 2013 3:02 PM

All replies

  • What volumes are we talking here? I've never really experienced any performance issues over simple XML path queries like this, but then again, I've never used it for anything larger than a few hundred/few thousand rows.
    Thursday, June 27, 2013 3:13 PM
  • The volume is typically less than 100 parent rows with less than 3 rows being concatenated per parent row.

    Lefka

    Thursday, June 27, 2013 3:16 PM
  • Hi Lefka,

    We can add a clustered index on Person_Id column, you can compare the execution plan and their costs of the following codes: 

    CREATE TABLE Example_Table ( Person_Id INT ,
    							  Favorite_Color VARCHAR(20) )
    INSERT	INTO Example_Table
            ( Person_Id, Favorite_Color )
    VALUES  ( 1, 'Red'  ) ,
    		( 1, 'Blue' ) ,
    		( 2, 'Green' ) ,
    		( 3, 'Blue' ) ,
    		( 3, 'Green' )
    
    CREATE TABLE Example_Table2 ( Person_Id INT ,
    							  Favorite_Color VARCHAR(20) )
    INSERT	INTO Example_Table2
            ( Person_Id, Favorite_Color )
    VALUES  ( 1, 'Red'  ) ,
    		( 1, 'Blue' ) ,
    		( 2, 'Green' ) ,
    		( 3, 'Blue' ) ,
    		( 3, 'Green' )
    
    create clustered index idx_ci_Person_Id on Example_Table2(Person_Id)
    		
    SELECT  Person_Id ,
            SUBSTRING(( SELECT  ( '; ' + Favorite_Color )
                        FROM    Example_Table AS et2
                        WHERE   et2.Person_Id = et.Person_Id
                      FOR
                        XML PATH('') ), 3, 100)
    FROM    Example_Table AS et
    GROUP BY Person_Id
    
    
    SELECT  Person_Id ,
            SUBSTRING(( SELECT  ( '; ' + Favorite_Color )
                        FROM    Example_Table AS et2
                        WHERE   et2.Person_Id = et.Person_Id
                      FOR
                        XML PATH('') ), 3, 100)
    FROM    Example_Table2 AS et
    GROUP BY Person_Id
    
    DROP TABLE Example_Table
    DROP TABLE Example_Table2
    


    Allen Li
    TechNet Community Support

    Saturday, June 29, 2013 6:26 AM
    Moderator
  • Take a look at example 4 at this blog:

    http://www.sqlusa.com/bestpractices/training/scripts/commadelimitedlist/

    Kalman Toth Database & OLAP Architect sqlusa.com
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012

    Saturday, June 29, 2013 7:52 AM
    Moderator
  • After reviewing the post, I wrote the query using STUFF instead of SUBSTRING. The resulting execution plans were identical and the performance was of course identical.

    @ Allen Li - The actual table (not the example table I provided) is already indexed. Regardless, I was more interested in the proper method for getting the data into the desired format.

    It seems as though the method I'm using is pretty industry standard so I will stick with it.

    Thanks


    Lefka

    Monday, July 01, 2013 3:43 PM