T-SQL: Split String with a Twist

T-SQL: Split String with a Twist

Introduction

 
String splitting is one of the most common problems in the Transact-SQL and programming in general. There are many blog posts discussing splitting a comma-delimited string in SQL. My favorites could be found at T-SQL Useful Links / String Split. Yet recent MSDN Transact-SQL forum post SQL-Split String brought an interesting twist to this problem and thus I want to discuss it in this article.

Problem Definition


The topic starter was kind enough to provide a table's DDL along with some data.

CREATE TABLE #ParseString ( id INT, string VARCHAR(128) )
INSERT #ParseString VALUES (1,'A6782839 A1768983A3A6A362727323AD6A9E8BD-0D15-405C-AF10-9884BDE572E3A6133366')
INSERT #ParseString VALUES (2,'A6783834A2748983A3A4A362723313AD6A64302717-D354-48D4-9E20-ADC11EF59A1332222')

as well as the desired output

SELECT id,string,6782839 AS Storeid,1768983 AS ProductID,3 AS ProductTypeLookup,6 AS LineOfBusiness, 362727323 AS ManagerKey, 'D6A9E8BD-0D15-405C-AF10-9884BDE572E3A6133366' Tuid
FROM #ParseString
WHERE id=1
UNION
SELECT id,string,6783834 AS Storeid,2748983 AS ProductID,3 AS ProductTypeLookup,4 AS LineOfBusiness, 362723313  AS ManagerKey, 'D6A64302717-D354-48D4-9E20-ADC11EF59A1332222' Tuid
FROM #ParseString
WHERE id=2

The string needed to be split using 'A' as a splitting symbol into 6 parts. Last part also included A as part of the string.

Solution


The first idea that came to mind was to use any of the splitting functions available to split using 'A' character as a splitting character. However, I decided against that idea because of the last part where I would have to concatenate the rest of the string. Therefore I decided to use recursive common table expression to do the splitting that would include both the word and the rest of the string. After that I used CASE based pivot idea to get the desired result:

;
 
WITH cte
AS (
    SELECT id
        ,string
        ,substring(string, 2, charindex('A', substring(string, 2, len(string) - 1)) - 1) AS Word
        ,substring(string, charindex('A', substring(string, 2, len(string) - 1)) + 2, len(string)) AS Rest
        ,0 AS [Level]
    FROM #ParseString
     
    UNION ALL
     
    SELECT id
        ,string
        ,substring(Rest, 1, charindex('A', rest) - 1) AS word
        ,substring(Rest, charindex('A', rest) + 1, len(rest)) AS Rest
        ,[Level] + 1
    FROM cte
    WHERE Rest LIKE '%A%'
    )
SELECT id
    ,string
    ,max(CASE
            WHEN [Level] = 0
                THEN Word
            END) AS StoreID
    ,max(CASE
            WHEN [Level] = 1
                THEN Word
            END) AS ProductID
    ,max(CASE
            WHEN [Level] = 2
                THEN Word
            END) AS ProductTypeLookup
    ,max(CASE
            WHEN [Level] = 3
                THEN Word
            END) AS LineOfBusiness
    ,max(CASE
            WHEN [Level] = 4
                THEN Word
            END) AS ManagerKey
    ,max(CASE
            WHEN [Level] = 4
                THEN Rest
            END) AS tUID
FROM cte
GROUP BY id
    ,String

which produces our desired result:

 

 In this solution we are using CHARINDEX and SUBSTRING functions continuously to get each word and the rest of the string. That rest of the string is used for the tUID column.

Conclusion


As we saw in that problem, sometimes it makes sense to apply recursive solution directly rather than using the existing splitting solution with concatenation at the end. It will be interesting to hear from the topic starter which of the suggested solutions performed better.
 

See Also


This article participated in the TechNet Guru Contributions for March, 2014 and won Bronze medal.
Sort by: Published Date | Most Recent | Most Useful
Comments
  • Hello Naomi,

          One more way as below:

    CREATE TABLE #ParseString ( id INT, string VARCHAR(128) )

    INSERT #ParseString VALUES (1,'A6782839 A1768983A3A6A362727323AD6A9E8BD-0D15-405C-AF10-9884BDE572E3A6133366')

    INSERT #ParseString VALUES (2,'A6783834A2748983A3A4A362723313AD6A64302717-D354-48D4-9E20-ADC11EF59A1332222')

    SELECT id,string,6782839 AS Storeid,1768983 AS ProductID,3 AS ProductTypeLookup,6 AS LineOfBusiness, 362727323 AS ManagerKey, 'D6A9E8BD-0D15-405C-AF10-9884BDE572E3A6133366' Tuid

    FROM #ParseString

    WHERE id=1

    UNION

    SELECT id,string,6783834 AS Storeid,2748983 AS ProductID,3 AS ProductTypeLookup,4 AS LineOfBusiness, 362723313  AS ManagerKey, 'D6A64302717-D354-48D4-9E20-ADC11EF59A1332222' Tuid

    FROM #ParseString

    WHERE id=2

    select id,string,substring(string, 2, Search3.Pos-1 - Search2.Pos ),substring(string, Search2.Pos + 1, Search3.Pos - Search2.Pos - 1)

    ,substring(string, Search3.Pos + 1, Search4.pos - Search3.Pos   - 1),substring(string, Search4.Pos + 1, Search5.pos - Search4.Pos   - 1)

    ,substring(string, Search5.Pos + 1, Search6.pos - Search5.Pos   - 1),substring(string, Search7.Pos , LEN(string))

    from #ParseString

     cross apply (select (charindex('A', string))) as Search1(Pos)

     cross apply (select (charindex('A', string, Search1.Pos+1))) as Search2(Pos)

     cross apply (select (charindex('A', string, Search2.Pos+1))) as Search3(Pos)

     cross apply (select (charindex('A', string, Search3.Pos+1))) as Search4(Pos)

     cross apply (select (charindex('A', string, Search4.Pos+1))) as Search5(Pos)

     cross apply (select (charindex('A', string, Search5.Pos+1))) as Search6(Pos)

     cross apply (select (charindex('D', string, Search6.Pos+1))) as Search7(Pos)

    Drop table #ParseString

  • Hi Naomi,

    Very interesting. Based on your requirement, I think it can be done just by a single select statement with patindex and substring as below:

    --Sample table starts here

    CREATE TABLE #ParseString ( id INT, string VARCHAR(128) )

    INSERT #ParseString VALUES (1,'A6782839 A1768983A3A6A362727323AD6A9E8BD-0D15-405C-AF10-9884BDE572E3A6133366')

    INSERT #ParseString VALUES (2,'A6783834A2748983A3A4A362723313AD6A64302717-D354-48D4-9E20-ADC11EF59A1332222')

    --Solution Starts here

    Declare @MidString as varchar(128);

    Select @MidString =

    Replace(

    Substring(string

    , patindex('%[^0-9][^A-z]%', substring(string, 2,patindex('%A[^0-9]%',string))) + 2

    , patindex('%A[^0-9]%',string) -  patindex('%[^0-9][^A-z]%', substring(string, 2,patindex('%A[^0-9]%',string))) -2

    )

    , 'A', '.'

    )

    FROM #ParseString;

    --Print @midString

    Select

    substring(string, 2, patindex('%[^0-9][^A-z]%', substring(string, 2,patindex('%A[^0-9]%',string))) - 1) As StoreID

    , parsename(@midstring,4) As ProductID, parsename(@midstring,3) As ProductTypeLookup, parsename(@midstring,2) As LineofBusiness, parsename(@midstring,1) As ManagerKey

    , stuff(string,1,patindex('%A[^0-9]%',string),'') As TUID

    FROM #ParseString;

  • Sorry, should be this:

    --Sample table starts here

    CREATE TABLE #ParseString ( id INT, string VARCHAR(128) )

    INSERT #ParseString VALUES (1,'A6782839 A1768983A3A6A362727323AD6A9E8BD-0D15-405C-AF10-9884BDE572E3A6133366')

    INSERT #ParseString VALUES (2,'A6783834A2748983A3A4A362723313AD6A64302717-D354-48D4-9E20-ADC11EF59A1332222')

    --Solution Starts here

    ;With CTE

    as

    (

    Select String, MidString =

    Replace(

    Substring(string

    , patindex('%[^0-9][^A-z]%', substring(string, 2,patindex('%A[^0-9]%',string))) + 2

    , patindex('%A[^0-9]%',string) -  patindex('%[^0-9][^A-z]%', substring(string, 2,patindex('%A[^0-9]%',string))) -2

    )

    , 'A', '.'

    )

    FROM #ParseString

    )

    Select

    substring(string, 2, patindex('%[^0-9][^A-z]%', substring(string, 2,patindex('%A[^0-9]%',string))) - 1) As StoreID

    , parsename(midstring,4) As ProductID, parsename(midstring,3) As ProductTypeLookup, parsename(midstring,2) As LineofBusiness, parsename(midstring,1) As ManagerKey

    , stuff(string,1,patindex('%A[^0-9]%',string),'') As TUID

    FROM CTE;

  • Nice solution Steven Wang, much more efficient in terms of resources / performance 15% vs 85% (checked on SQL 2012).

Page 1 of 1 (4 items)